Previous | Next --- Slide 15 of 63
Back to Lecture Thumbnails
mpile13

So, the advantage in the second arrangement is that the third row is already there in the cache when you want to store the information, so there would be fewer cache misses.

The optimization in the third arrangement is that almost all the information is there for you, so there are almost no cache misses except for the cold misses.

Is there any advantage to the first arrangement?

bstan

I think the three working sets mentioned here are just different amounts of caching that can be done. What you pointed out about the second and third arrangement are correct, but in situations where there are cache size limitations, we would have to go with the first arrangement, which is advantageous when using a small cache because the values needed to calculate for that particular cell are loaded into the cache.

sbly

I agree with @mpile13; I don't see any benefit to being able to fit the first set into your cache. To update the red dot, you only need to access each of the black dots once, so having them in your cache doesn't help any. Having data in your cache is only helpful when you're accessing it repeatedly.

bxb

Caches typically bring in data in block amounts, which means if the block size was large enough we could access various elements around the grid such that it eventually pulls in something like the smallest working set for a cell.

cardiff

@sbly You mean there is not much benefit to caching /just/ the first set? Even though the black dots are only used once when updating the red dot, they are used other times to update other dots in the current part of the partition.

spilledmilk

I believe that the benefit to being able to fit the first set into your cache is that given the limitation that your cache can only fit 5 points (unlikely, but this is for hypothetical purposes), it is possible to arrange your data such that with one cold miss, all the data necessary for a single computation can be loaded.

However, given some other arrangement of the data, perhaps in row-major order, loading all the data necessary for a single computation might take 3 cache misses. This is important if the computation for each point is done in a different thread. In this case, each thread will not utilize the extra points in the cache, and there will be wasted memory bandwidth and cache space.