When a thread is idle and steals some work from another, it would be best to steal the largest amount of pending work because this will potentially reduce the number of times that the thread will have to steal again in the future, thereby reducing communication overhead.
Here, if Thread 1 had stolen the work for cont:26-50 (25 elements), it would finish that computation quickly, and then would have to find another thread to steal from again, leading to some amount of overhead. By taking cont:101-200 (100 elements), the thread has much more work to do, and will not have to steal work (and communicate) for a longer period of time.
So in the implementation of work stealing, do the worker threads just always assume that the heavier work tasks will always be at the head of the work queues? Is this always the case?
@BestBunny: I don't think we can assume that. Here we are partitioning using the middle element (which is more like mergesort). Ideally, in Quicksort the partitions vary in size. So, we cannot guarantee the task at the head of the queue will be the heaviest.
So we have two strategy. First is to always steal the task at the head of the queue, which cannot guarantee we steal the heaviest task. Second, we try to predict the cost of each task and steal the heaviest one. But it need some mechanism to predict the cost and it will have some overhead.
You could also implement a deck in a way that has the heaviest cost task at the top of the deck making it easier for the idle threads to know which one to steal.
We can't guarantee that the largest piece of work will always be at the top of the queue, so we need some mechanism to find it when stealing. There is obviously a trade-off between finding the largest piece of work and the overhead involved in looking for it. As "mario" mentioned we can probably use some special data structure to keep track of the largest piece of work, and in this way somehow reduce the overhead.
If you don't know stealing which one would be the most optimal, then stealing randomly will theoretically achieve optimal balance.
Implementation-wise, how does work stealing work with local variables? For example, if the threads of local stacks, do the threads that are stealing that work steal the stack as well?