Previous | Next --- Slide 38 of 56
Back to Lecture Thumbnails
mchoquet
  1. Why not use two diff variables, instead of 3?

  2. This does reduce the synchronization, but it also increases the total amount of work done. Theoretically, the processor would be doing other work on the cores that were stuck at the extra barriers, but now those cores are computing the next iteration, and that work might not ever be needed. Is this a problem? It seems like the code we write here mostly assumes being run in a vacuum since high-performance programs usually have their own machines, so is the fact that it wastes some processor time not important?

kayvonf

@mchoquet: I do not believe this implementation performs any additional work compared to the version on the previous slide.

There may be a more efficient implementation... (if so, let's see it!)

mchoquet

Oh wait, you're right. I misread the code and thought some threads might be able to start doing an unnecessary extra iteration, but that's impossible. I'm still wondering why the diff array has 3 entries instead of 2 though.

wanghp18

@mchoquet: Assume there are two threads consume running right after the barrier. Suppose thread1 is suspended, and thread2 runs just before the barrier. If there are only two diffs, thread2 will zero out the diff thread1 will use to set the conditional variable, and will cause premature termination of thread1.

bwasti

I think doing it with two diffs and one barrier is impossible by simply reorganizing the logic in this code. Here's my reasoning:

You need a barrier between an update to the current diff and a check of the current diff. If this were not the case, the diff could be checked too early/updated to late and lead to an incorrect result.

Now assume we only have two diffs and one barrier. I've shown the barrier must exist after the update of the diff and before the check of the diff. Somewhere in between those two points must be a zeroing of the next diff. This is where problems are caused.

Review the current state (update current diff) --> barrier --> (check current diff).

We need to insert (zero out diff) before we start using it. If the diff is only used once, we'd have two loops in the while loop and there would be no problem, but we can't assume that.

There are two reasonable ways to insert (zero out diff):

(update current diff) --> barrier --> (check current diff) --> (zero out [next] diff)

and

(zero out [current] diff) --> (update current diff) --> barrier --> (check current diff)

If you put the zero out anywhere else, you'd clearly have a problem. However, both of these cases (which I believe are exhaustive) can screw things up. Observe that they are essentially the same as long as the loop guard does not evaluate to false and consider the second case:

(zero out [current] diff) --> (update current diff) --> barrier --> (check current diff)

Have two threads race from the barrier. Assume one thread just checked the current diff. A second thread at this point has already zeroed out and updated the next diff. Now the first one starts back at the top of the loop. It zeros out the current diff and ruins the result of the second thread.

I think this is general enough to show that as long as there is a "done" state controlled by the "diff" value, you can't just have two diffs and one barrier. However, if the loop guard does some crazy stuff, I'm not sure what is possible. Also the conditional that sets the "done" state could potentially have an accumulator that allows you to reuse diff without zeroing it out, which would solve the main problem I pointed out.