Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2017

Previous | Next --- Slide 44 of 48

Master

axiao's summarization of the previous slide is very clear, and I explain here why one barrier is enough in the codes above:

Clearly, this barrier can ensure that all threads have set diff to 0, and ensure all threads have finished their updates to diff.

It also ensures that all threads have checked for convergence. Thinking about a lagging thread checking convergence but we have another threads gone far to the barrier. In this case, suppose the lagging thread is checking diff[index], and the fast thread will have updated diff[(index + 1) % 3] and reset diff[(index + 2) % 3], and these updates will not affect the correctness of checking convergence. And we can see that three copies of diff is the minimum.

anonymous

The barrier function in the while loop guarantees that the diff's use will not exceeds the circulation length 3. Actually, it breaks the diff array's circulation.
And the global diff[3] is big enough to store all possible phases: "checking diff[index]", "updated diff[(index + 1) % 3]" and "reset diff[(index + 2) % 3]".

Abandon

Let me explain why we need diff[3] rather than diff[2]. Let assume that we use the array of diff with length 2, firstly, all threads come to the barrier have index=0 and they all already set diff[1]=0.f for next round of loop, then all threads go through the barrier, some quick threads will finish the rest of the loop of checking diff[0] and set the index=1, then they may continuously execute the next round to the instruction before the barrier. But some other slow threads have not finished the first instruction after the barrier at the first round. In this case, the fast threads may execute the instruction before the barrier which set diff[0]=0.f. So when the slow threads start check the diff[0], they find diff[0]=0.f. They will exit directly, which cause problem.