Previous | Next --- Slide 26 of 45
Back to Lecture Thumbnails
taegyunk

It would be much more beneficial to steal from the top of dequeue because

  • we can reduce the overhead from communication by exchanging a "larger" piece of work.
  • With run-child-first policy, the local thread would only push and pop at the bottom of dequeue which would potentially increase cache locality.
  • Also, stealing from other threads and pushing and popping from local thread would not interfere that much and we could reduce contention to work queue.
sluck

This paper written by one of the founders of Cilk arts details scheduling work items by stealing. In particular, it mentions that if an idle thread randomly chooses a thread to steal from but that thread also has an empty queue, then the first thread will uniformly at random choose another thread to steal from. More interestingly though, the paper talks about the motivation behind choosing to steal work instead of having a scheduler distribute the work. The idea is that when a scheduler distributes work, it is always distributing available work even to workers that are busy working. If instead threads steal work only when they're idle, then when all threads are busy, no work will be migrated around. Thus the paper claims that the overall amount of work migrated would be less when work is stolen rather than distributed.

benchoi

If most of the threads are idle (e.g. one of them is working on a large task and the rest have nothing to do), it seems that they will all be trying to steal from each other repeatedly and unsuccessfully. Would that generate enough overhead to be a significant problem, or does that not matter?

eatnow

Why is stealing done randomly, instead of keeping track of the size of work queue (or at least estimate it) and stealing from a thread with a lot of work queued up? It seems to make more sense, especially when only a few threads have work.

mchoquet

@eatnow, I imagine they're trying to avoid some sort of global data structure like that because of the pain of doing synchronization on it, as well as the extra coherence traffic. Random guessing probably has much less overhead in the normal case.

paraU

@sluck. It's helpful! You guys are great!