Previous | Next --- Slide 50 of 55
Back to Lecture Thumbnails
dyzz

The second version is better because it reduces false sharing between threads because each thread will now have a counter on its own cache line. In the top version many of the counters share the same cache line so when a thread updates its counter the invalidation of that line must be broadcast to all other threads and they must then get the updated data before making their own changes.

yulunt

Although the second version reduces false sharing between threads, it may affect the benefit of locality and increase the ratio of conflict misses in some cases. For example, the RGB problem in exercise 3. Each thread is working on a particular channel. Adjacent pixels accessed by a thread are now in different cache line. When designing data structure, one should consider how the program is parallelized as well.

thunder

For the specific example in this slides, adding padding to each element in the counter array can guarantee exclusive access to one cache line for each thread and therefore the false sharing problem in the first version can be eliminated. And I think in order to avoid false sharing and at the same time exploit the benefit of locality, we need to put contents that will be sequentially accessed by one thread on the same cache line while putting contents that will be accessed by different threads on separate cache lines.