Previous | Next --- Slide 43 of 48
Back to Lecture Thumbnails

The first Barrier is to prevent the case that after some thread calculates diff, diff gets overwritten by other threads when executing diff = 0.f.

The second Barrier is to make sure that all threads has calculated their own diff, so the final diff is the sum we want before checking convergence.

The third Barrier is to prevent the case that some thread already on next iteration writes diff to 0, which will make the if statement true for other slower threads, leading to a wrong result.


Note that if the pseudocode also included code to compute the black cells that there would be a barrier after the for (i = red cells in this row) loop.


I think we can replace the first and third barrier with one barrier right after 'while(!done)'


I have one question about the third barrier. I think the third barrier is useless because the result will not change. I will come back to add more information...

I'm back.

Ok, what I mean here is, even the diff was set to 0, the answer of (diff/(n * n) < TOLERANCE) will not change. As a result, if one thread passed the (diff/(n * n) < TOLERANCE), the diff will not be set to zero; if one thread do not passed (diff/(n * n) < TOLERANCE), other threads will definitely fail (diff/(n * n) < TOLERANCE) no matter diff was set to zero or not.

Am I right or not?


If the third barrier were not there, then one thread could move on to the next iteration of the while(!done) loop before another thread. This means the first thread could set diff=0 before the second thread evaluated the (diff/(n*n) < TOLERANCE) line.


After some thought, it seems that if the 1st and 3rd barrier were placed tightly around diff = 0.f, things might have been a litter clearer for me at first.


I will try to use one barrier now. Obviously we cannot use only one barrier with other code unchanged. We can see that it is because all workers are working on a same variable diff here that makes them have to wait for each other. I think we can assign a diff variable to each worker, so that we can same some barrier.

It is true that this approach will go super slow if we have many workers working in parallel. However, if we assume we have less than 16 workers now, the overhead is not that high.

The code is below:

float* diff; // With size of NUM_PROCESSORS;

float myDiff;
int threadId = getThreadId();
int myMin = 1 + (threadId * n / NUM_PROCESSORS);
int myMax = myMin + (n / NUM_PROCESSORS);

while(!done) {
    float mydiff = 0.f;
    diff[threadId] = 0.f;
    for (j = myMin to myMax) {
        for (i = red cells in this row) {
            float prev = A[i, j];
            A[i,j] = 0.2f * pointsAllAround();
            myDiff += abs(A[i,j] - prev);
    for (int i = 0; i < NUM_PROCESSORS; ++i){
        diff[i] += myDiff;
    barrier(myBarrier, NUM_PROCESSORS);
    if (diff[threadId]/(n*n) < TOLERANCE)
    {                           d


Can we remove three barriers and diff=0.f, then put one barrier exactly before lock and diff=0.f before that barrier?

Update: Never mind, there would be race if so.


I think if we have a different diff variable for each thread, we could remove the bottom barrier. However, I need to work out the logic to see if it holds true.


One way of thinking about why the barriers are needed is to compare two separate executions across the barriers and observe how the race conditions happen.