Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2015

Previous | Next --- Slide 22 of 45

Jing

I wonder what, then, is the difference between cilk and openMP abstraction. It seems that anything that can be done using cilk can be done with openMP...

rojo

@Jing: From my understanding both Cilk and openMP are similar abstractions for the user but the implementations are different. For example openMP does not talk about stealing work from another thread like in case of Cilk. But I guess it would be interesting to see how each of these operate on a specific example. I guess the performance will then depend on the kind of working set on which each of these are used.

kayvonf

To be precise, there are differences in semantics. The Cilk code above is a sequential for loop, where each iteration of the loop body involves a (potentially async) invocation of foo. After the loop is complete the calling thread waits for all async work to complete before proceeding past the cilk_sync.

A similar OpenMP parallel for loop is:

#pragma omp parallel for
for (int i=0; i<N; i++)
   foo(i);

This code has the semantic that each loop iteration is independent and can be executed in any other, but control doesn't resume after the loop body until all iterations have been completed. Although both pieces of code essentially expose the same amount of independent work to the runtime, the semantics of the code snippets are certainly different.

byeongcp

Has either method (running child first or running continuation first) shown to be particularly faster than the other? I can see that if we run continuation first, the work queue gets occupied with set of work to do so if there are multiple idle threads, they can all steal a piece of work fast. On the other hand, the "stealing continuation" part is serialized in child first method, but eventually all the idle thread would be occupied with a work stolen from other thread that put the next continuation (and assuming foo takes a while to compute). Thus, stealing continuation would have a longer startup time, but in terms of overall performance, I can't think of a case where child stealing would be noticeably faster than continuation stealing.

muyangya

I'm still a little bit confused. Is there any conclusion on the discussion of these two methods? Is anyone strictly better than the other? As fas as I'm concerned, these two methods have mostly the same work pattern, but child stealing may generate a huge work queue upfront. So is continuation stealing strictly better?