It is interesting to note one of the tradeoffs between distributed and global work queues. When you have multiple threads of the same process running on the same core you gain from the spatial locality of data and at least the code section common to both threads. On the other hand, if they were to be distributed onto separate cores, they might write to data existing on the same cache line on different cores. This could result in some of the effects of techniques used to maintain cache coherence, like more frequent invalidation of cache lines on one core due to writes made on another.
nishadg
So this is basically a balance between a fully shared work queue (where there is contention on every access for more work) and completely static assignment (where there is no contention when getting more work)
It is interesting to note one of the tradeoffs between distributed and global work queues. When you have multiple threads of the same process running on the same core you gain from the spatial locality of data and at least the code section common to both threads. On the other hand, if they were to be distributed onto separate cores, they might write to data existing on the same cache line on different cores. This could result in some of the effects of techniques used to maintain cache coherence, like more frequent invalidation of cache lines on one core due to writes made on another.
So this is basically a balance between a fully shared work queue (where there is contention on every access for more work) and completely static assignment (where there is no contention when getting more work)