Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2015

Previous | Next --- Slide 15 of 45

Elias

We mentioned this briefly in class: why is this a per-thread queue? It would be simpler in theory to have a single unified queue, and have each thread poll that queue (rather as we did with the shared work queue for assignment 4). The justification was simple: contention. By having individual work queues, we have no need for synchronization, and we don't incur high costs under light work heavy load (when there might be high contention for lots of small jobs).

Doing it this way avoids the issue of contention, but is fundamentally inferior to the single shared queue - just think about visiting a supermarket. To help balance load, the threads poll their own queues, but then steal jobs from other thread's queues (when their own become empty). The task of selecting the thread to steal from matters - and in fact, we choose randomly. (I think Kayvon mentioned this has been proven to be the optimal strategy, and within a pretty good factor of the optimal online strategy)

oulgen

Actually, Microsofts .NET framework has a really good explanation on this issue of per thread queues. I wouldn't have expected an explanation like this from microsoft but https://msdn.microsoft.com/en-us/library/ff963549.aspx under Decentralized Scheduling Techniques mentions why per thread queue is better and how it is possible to do this work stealing without any locks.

sgbowen

@Elias: How is synchronization avoided by individual work queues? Other threads are still stealing from different threads' queues, so they're still accessing the queue concurrently. I can see how synchronization /might/ be easier to manage with the guarantee of one producer, but it seems like it would still be necessary unless I'm missing something.

Elias

@sgbowen: The link oulgen posted is actually a very good resource. Scroll down to where it says "The Thread Pool." The short answer is that we're eliminating the single point of contention. By having many queues, we ensure that we never have a large number of threads competing to pull from a single queue. If that were to occur, and the jobs were to be very small, we would be reduced to a fundamentally sequential control flow (and, worse, one with great big contention issues).

By having threads steal from (randomly selected) other threads queues, we drastically reduce the amount of contention (in expectation). The result is a much more efficient implementation.