Spatial locality is good in A, but it is bad in B because we access a different row in B in every iteration, so we need to load a new row into cache every time. Temporal locality in C is good because we keep accumulating into the same value, but it is bad in A because we need to reread the same values of A multiple times to compute different elements of C, and A's row could have been evicted from cache by the time we want to reread it by one of B's rows.
Meanwhile, sparse Matrix multiplication remains a open area of speeding up machine learning.
When I studied this slide earlier for the exam, I had trouble reminding myself why this was a problem.
Luckily, there was a very clear explanation in one of the 213 recitations.
The explanation is at slide 25, and on slide 26 it explains why the blocked version on the next page of this powerpoint is more cache friendly:
On top of what @huehue said, the slides mention that there may be capacity misses too.
Another more mathematical analysis of why the blocked version is better and this normal version is worse can also be seen in these slides: