Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2017

Performance Optimization II: Locality, Communication, & Contention

Previous | Next --- Slide 60 of 72

apr

It is interesting to note one of the tradeoffs between distributed and global work queues. When you have multiple threads of the same process running on the same core you gain from the spatial locality of data and at least the code section common to both threads. On the other hand, if they were to be distributed onto separate cores, they might write to data existing on the same cache line on different cores. This could result in some of the effects of techniques used to maintain cache coherence, like more frequent invalidation of cache lines on one core due to writes made on another.

nishadg

So this is basically a balance between a fully shared work queue (where there is contention on every access for more work) and completely static assignment (where there is no contention when getting more work)