Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

Perf Optimization I: Work Distribution and Scheduling

Previous | Next --- Slide 18 of 64

PandaX

Using distributed set of queues can reduce the synchronization cost of using single work queues. Because we can now have 4 threads fetching tasks from queues at the same time in this example.

mallocanswer

@PandaX, do you compare the synchronization cost of using single work queues with 4 work queues? Yes, that's true. But I'm wondering whether the introduction of steal will increase the synchronization cost.

MaxFlowMinCut

@mallocanswer That's an interesting question, and I expect that in some capacity the answer is yes. Given that threads can see each other's queues and steal work, it follows that there must be some sort of synchronization (and possibly locking if a global shared address space model is used) to share the information.

Edit: Indeed the next slide confirms that synchronizing queues can be quite costly.

karima

@MaxFlowMinCut

Yes and this slide shows that since work stealing may be costly we generally prefer to steal a larger amount of work so that once we steal, we won't have to steal again for a while.

cyl

Suppose we have four queues, how t determine "which queue" to steal, if stealing a queue that is about to empty, isn't that cause more uneven workload?

Lotusword

Stealing also introduces the important issue of termination: How do we decide when to stop searching for tasks to steal and assume that they're all done, given that tasks generate other tasks that are dynamically inserted in the queues?

lol

@cyl You randomly choose a queue to steal from. You can prove that if you use this randomized strategy, then the runtime is O(W/P + S) with high probability.