Slide View : 15-418/618 Spring 2014

Parallel Programming Case Studies

Previous | Next --- Slide 15 of 63

Back to Lecture Thumbnails

mpile13

So, the advantage in the second arrangement is that the third row is already there in the cache when you want to store the information, so there would be fewer cache misses.

The optimization in the third arrangement is that almost all the information is there for you, so there are almost no cache misses except for the cold misses.

Is there any advantage to the first arrangement?

This comment was marked helpful 0 times.

bstan

I think the three working sets mentioned here are just different amounts of caching that can be done. What you pointed out about the second and third arrangement are correct, but in situations where there are cache size limitations, we would have to go with the first arrangement, which is advantageous when using a small cache because the values needed to calculate for that particular cell are loaded into the cache.

This comment was marked helpful 0 times.

sbly

I agree with @mpile13; I don't see any benefit to being able to fit the first set into your cache. To update the red dot, you only need to access each of the black dots once, so having them in your cache doesn't help any. Having data in your cache is only helpful when you're accessing it repeatedly.

This comment was marked helpful 0 times.

bxb

Caches typically bring in data in block amounts, which means if the block size was large enough we could access various elements around the grid such that it eventually pulls in something like the smallest working set for a cell.

This comment was marked helpful 0 times.

cardiff

@sbly You mean there is not much benefit to caching /just/ the first set? Even though the black dots are only used once when updating the red dot, they are used other times to update other dots in the current part of the partition.

This comment was marked helpful 0 times.

spilledmilk

I believe that the benefit to being able to fit the first set into your cache is that given the limitation that your cache can only fit 5 points (unlikely, but this is for hypothetical purposes), it is possible to arrange your data such that with one cold miss, all the data necessary for a single computation can be loaded.

However, given some other arrangement of the data, perhaps in row-major order, loading all the data necessary for a single computation might take 3 cache misses. This is important if the computation for each point is done in a different thread. In this case, each thread will not utilize the extra points in the cache, and there will be wasted memory bandwidth and cache space.

This comment was marked helpful 0 times.