Previous | Next --- Slide 7 of 23
Back to Lecture Thumbnails

Why is true sharing such a source of cache misses? We invalidate data that often? To check my understanding, what exactly do they mean by true sharing cache misses? They are just cache lines that are invalidated, right?


As I understand it, true sharing cache misses are when two processors actually access the same piece of data, and need to communicate to make sure that they have the most up-to-date version of it. False sharing is, I think, when they just happen to access different data in the same cache line (this is what padding can help with).


Why does Ocean Sim only have Capacity/Conflict misses and no Cold misses?


Could someone give me a definition for upgrade, true sharing and false sharing?


@mingf: True sharing means that, two threads share some data and when one updates the data and it should notify the other to update the data in cache so the program can work correctly and this is also what's done in CPU. False sharing means that, even one thread updates the data, the data in the other thread don't need to update it to work correctly. And in fact the second thread still will update it, and this wastes time. And sorry I don't know what's upgrade. Can anyone explain it?


@mingf: This can be explained with an example: Consider two memory locations x and y which fall in the same cache line. Consider processor P1 which accesses and modifies x first leading to cold miss and brining this data to cache (cache line). If P1 now accesses and modifies y, there is a hit as it is in the same cache line. If processor P2 now wants to access x and y, then P1 can update or invalidate both x and y at once since they belong to same cache line. This is true sharing. False sharing is the case where if Processor P2 modifies y, but P1 modifies x then this leads to thrashing of cache line due to invalidations by caches in each of these processors. Upgrade is the case where if y was modified by P2 then this changed value must be updated in P1's cache. P2's cache asks the cache controller to 'update' the value of y in P1 so that it operates on the correct data.


Wait, I thought upgrade was the time loss/communication overhead due to a cache line changing states between 'exclusive', and 'shared', etc?


mingf: False sharing is when 2 processors need to write to different addresses that are on the same cache line, so even though those processors don't need to see each other's writes to stay up to date it causes a lot of extra artifactual communication.


I'm curious: how was this graph generated? How can we accurately measure false sharing, true sharing, etc. Sounds like measuring the different kinds of misses would be very useful.


@cube: I believe Ocean Sim seems like it has no cold misses because it is performing a significantly higher number of memory operations per data element than the other programs. Cold misses have a fixed upper limit proportional to the size of the working set because they only occur on the first access of some memory location. As the program makes more memory requests, this fixed amount tends toward zero. You can see this same phenomenon with Barnes-Hut where the cold misses contribute very little. This makes sense for scientific computing applications because they take a fixed sized initial state and evolve it over a long time period. During this period, they are reading and writing the same values many times.

The reason radix sort has such a high amount of cold misses is because it performs relatively few total memory operations per data element.