Previous | Next --- Slide 37 of 56
Back to Lecture Thumbnails
smklein

Although the number of barriers can be changed, here is their rationalization for this program:

Barrier 1: Needed, otherwise a thread could add to diff in the line diff += myDiff, and a different thread could reset the global diff to zero in the line float myDiff = diff = 0.f immediately afterwards.

Barrier 2: Needed, otherwise a thread may determine that it is done with an unnaturally low diff which doesn't have the contributions from other threads.

Barrier 3: Needed -- other threads must compute if 'done' is "true" or "false" based on the line if (diff / (n * n) < TOLERANCE) before anyone has the chance to set the global diff back to zero at the beginning of the while loop.

spilledmilk

I believe the code can be slightly rewritten so that only the first two barriers are needed. The code after the for loop should look like:

lock(myLock);
diff += myDiff;
unlock(myLock);
done = true;
barrier(myBarrier, NUM_PROCESSORS);
if (diff/(n*n) >= TOLERANCE)
    done = false;

This essentially sets done := true by default and then checks in the if statement if it should be set to false. The difference between this and the given code is that after the second barrier, if diff/(n*n) < TOLERANCE, then each thread skips the new if statement and finishes the while loop. However, if diff/(n*n) >= TOLERANCE, the first thread that encounters the if statement will set done := false, and because the only place done is set to true is between the two barriers in the while loop, all threads will continue on to the next iteration of the while loop even if diff is reset to 0 before they process the if statement, removing the necessity of the last barrier in the original code.

bxb

Thinking about spilledmilk's answer, his method is valid assuming that done, stored in global memory somewhere, gets checked by every thread on every access. If done was instead stored in a register I'm not sure it will necessarily work every time. Aside from C's volatile keyword, what other mechanisms for dealing with thread memory accesses/caching are there? (I'm not sure I'm wording my concern properly but hopefully it makes sense.)