Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

jerryzh

In the lecture, professor mentions that when Thread 0 is idle, it waits for other threads immediately when "done" is less than "spawn". However, I feel it would be more efficient for it to wait until there is no more spawn it can do(when spawn equals to 10), that is, to wait only in the end, rather than every time it finishes its work.

Why is sync implemented this way? One reason I can think of is that maybe we don't know the total number of spawns the program will have, therefore, it's better for us to wait until all the threads finished, is this the main concern?

hofstee

If your next action depends on data computed before the sync, then continuing after launching the 10 spawns will give you incorrect results.

The point of the sync is to wait until the spawned functions have completed, not that they have launched.

jerryzh

I'm not saying continuing after launching the 10 spawns. What I mean is that say we have 3 threads, and thread 0 completed, and there is still 7 threads needs to be spawned, the professor mentioned in the lecture that thread 0 will wait until the other threads has completed their work before work on new spawns, I feel that the time spent here is wasted and we can work on new spawns immediately rather than waiting for other threads since there is still work to do. Maybe this is not what the professor says, just want to confirm.

hofstee

But the continuation was stolen by the other threads, so thread 0 would have to steal it back if you wanted that to happen.

jerryzh

@hofstee, I agree. The continuation is in Thread 2 when Thread 0 finishes, so the correct behavior would be Thread 0 stoles the continuation back or Thread 0 waits until spawn == done?

jerryzh

OK, I read the later slides in the lecture. The "greedy policy" in slide 57 did it in the efficient way we discussed here. Is any advantage using the stalling join implementation?

momoda

@ jerryzh. Using stalling join, we can know that the thread which initiates the fork must perform the sync and continue to run bar() after sync point. But using greedy policy, we cannot know which thread will continue execute when reach the join point. So maybe when there is a specific requirement, like forcing us to use same thread after join point, the stalling join implement will be better.

kayvonf

Yes, to recap what @jerryzh mentions, what you're suggesting is the "greedy join" strategy that is described later in this lecture, on slide 57. This greedy strategy, where no worker is allowed to go idle and always steals when idle, is exactly what the Cilk runtime implements.