Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

Previous | Next --- Slide 22 of 41

huehue

Spatial locality is good in A, but it is bad in B because we access a different row in B in every iteration, so we need to load a new row into cache every time. Temporal locality in C is good because we keep accumulating into the same value, but it is bad in A because we need to reread the same values of A multiple times to compute different elements of C, and A's row could have been evicted from cache by the time we want to reread it by one of B's rows.

yikesaiting

Meanwhile, sparse Matrix multiplication remains a open area of speeding up machine learning.

teamG

When I studied this slide earlier for the exam, I had trouble reminding myself why this was a problem.

Luckily, there was a very clear explanation in one of the 213 recitations.

The explanation is at slide 25, and on slide 26 it explains why the blocked version on the next page of this powerpoint is more cache friendly:

https://www.cs.cmu.edu/~213/recitations/recitation07-cachelab.pdf

On top of what @huehue said, the slides mention that there may be capacity misses too.

Another more mathematical analysis of why the blocked version is better and this normal version is worse can also be seen in these slides: http://www.cs.berkeley.edu/~demmel/cs267_Spr99/Lectures/Lect_02_1999b.pdf