Previous | Next --- Slide 43 of 51
Back to Lecture Thumbnails
firebb

If we use sparse matrix to compress the data, does it mean that we need to use parallel sparse matrix multiplication instead of dense matrix multiplication which we talked a lot in this lecture? I think the optimization for sparse matrix multiplication would be quite different.

Master

Similar idea is applied in the Project Adam paper, which updates weights asynchronously and allows for inconsistencies, due to the fact that neural networks are resilient training systems.

Metalbird

@firebb, there has been a ton of work done on optimizing sparse matrix multiplication, the issue being that there is almost always a cost compared to performing dense matrix multiplication. This cost comes from the fact that by not storing all of the zero elements you destroy the traditional indexing of data and have to rely on other methods to reference nonzero values. This can lead to a lower arithmetic intensity as you have to rely on more memory accesses. That being said, if the matrix is sparse enough that it's still faster to do sparse matrix multiplication, it is absolutely worth it, but it's usually problem specific.