The righthand side chooses a small enough workspace, so that capacity miss doesn't happen. However that on the lefthand side has a very poor cache performance. I'm wondering if it may lead to a even worse overall performance?
Is there any way we can generalize our code to take advantage of the specs of the cache on the system it is running on? It seems that our code has to me designed with a particular cache specification in line in order to minimize the misses. I was wondering if there was a way to write general code for any system with a cache such that we minimize misses with the above mentioned method.
Professor Bryant's 213 web aside on blocking seems relevant to this discussion.
@misaka-10032 if you look two slides earlier, you can see that if we iterate across every row in standard row-major order, the cells on the left have been evicted from the cache by the time we reach the next row. If we look at the cells on the left for this access pattern, we actually only take one miss for every 5 hits.