Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

Perf Optimization I: Work Distribution and Scheduling

Previous | Next --- Slide 43 of 64

BensonQiu

We mentioned in lecture that other threads should steal "cont: 100-200". Since Thread 0 is already working on 0-25, 26-50 might already be in Thread 0's L1 cache. Also, stealing has overhead, so we want to steal a larger chunk of work each time.

kayvonf

Yes, the main idea here is that if a thread is going to incur the cost of stealing some work from another thread, it's a good idea to steal the biggest piece of work that's available. Therefore, the cost of the steal is amortized over a large amount of work.

We also discussed how the smaller pieces of work at the bottom of thread 0's deck correspond to sorting the array elements that thread 0 most recently partitioned. Thus, letting thread 0 do that work in the future is wise, since the computation is likely to benefit from temporal locality in data access.

cyl

If threads share the same queue, then we don't have to steal from others. What is the disadvantage of sharing queue?

BensonQiu

@cyl: There will be contention if multiple threads access a single shared work queue.

Source

mperron

Would it be a good idea to implement a hybrid continuation stealing/child stealing? For example, if continue pushing children onto the stack until some threshold.

I am wondering because in this example, we are making assumption about the amount of work at different parts of the stack. Is this something which is actually implemented in the stealing logic or is it only a thought experiment?