Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

Previous | Next --- Slide 24 of 41

jhibshma

While this solution is very cache efficient, it has a worse asymptotic complexity than some matrix multiplication algorithms. Is it generally the case that the cache savings with a method like this outweigh the benefits of switching to something with better asymptotic complexity?

efficiens

Yes, I think so too because the complexity of so many for loops might be too high to offset the savings achieved from caching.

@jhibshma @efficiens, I think you are talking about different things. @jhibshma is saying that there exists some matrix multiplication algorithms that has a lower asymptotic complexity, like this, but that algorithm may not be cache friendly so the overall performance is worse than the one on this slide.

@efficiens is saying that too many loops will increase the instructions for loop logic (like jblock2M, jblock2+=L2_BLOCKSIZE_J). However comparing to the O(N^3) algorithm, the asymptotic complexity is the same.

krillfish

Considering that this algorithm is just another way of writing the previous algorithm, the asymptotic complexity is the same even though there are more for loops. As for the article that @RX linked to, I think it's blank?

Also, since retrieving memory from cache is significantly faster than not retrieving memory from cache, I think the number of for loops wouldn't be able to cause too much overhead for the caching's benefit to be lost.