Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2017

Perf Optimization I: Work Distribution and Scheduling

Previous | Next --- Slide 40 of 64

firebb

I'm a little bit curious about how continuation stealing is implemented. For child stealing, it is relatively easy to implement, since we only need to store the function pointer and the parameter and then let a thread to execute it (like a rpc call run by another thread). However, continuation stealing needs the thief to be able to continue executing previous code. This requires the thief having the exact same stack and execution context (registers, flags) as the victim (from whom the continuation is stolen).

One possible implementation I could imagine would be whenever a cilk_spawn is met, the thread would push all its execution context onto the stack and change to another stack frame by changing rbp and store the previous rbp somewhere so some thief could set their rbp to it and pop out all the execution context and do the continuation work.

bochet

Even if no stealing occurs, in the code given above, space cost for work queue is O(1), better than O(n) in continuation first. But in a quick sort, space cost of work queue can still be up to O(log n)

kayvonf

@firebb. To learm more about the original Cilk implementation, take a look at the "Further Readings" for this lecture (on the lectures page). In particular, the Cilk5 paper from PLDI 1998.

Abandon

if continuation first, the approach is a bit like scheduler-worker model. Thread 0 are the scheduler for task dispatch. If child first, the approach is more like divide and conquer. All thread are identical in the mechanism. The later one is obviously better in task splitting and dispatching part.

rav

It seems like in this particular example, child stealing is better if you have multiple threads (i.e more than 2) that can steal. Is that something that hold true in general?