Previous | Next --- Slide 46 of 59
Back to Lecture Thumbnails
monsterrev

Can someone please explain how is this working?

bpr

@monsterrev, could you elaborate on your concerns?

nba16235

As far as I know, it seems that the update of the variable "diff" is consisting of three stages: initialization (diff[(index+1)%3] = 0.0f), addition (diff[index] += myDiff), and check (if(diff[index]/(n*n) < TOLERANCE)).

Now let's see why we need three instances of diff (float diff[3]) to guarantee synchronization. In the first loop, the barrier makes sure two things will be completed: myDiffs are added to diff[0] while diff[1] is initialized to 0. In the second loop, the barrier guarantees three things: diff[0] is checked without being overwritten, myDiffs are added to diff[1] which is promised to be 0, and diff[2] is initialized to 0.

Therefore, I think three stages are the minimum requirement to guarantee synchronization. Welcome to correct me if my understanding is wrong.

acortes

Could someone explain the reasons behind the 3 in "index = (index + 1) % 3;". It appears to be a randomly chosen number as 2 would appear to also work (as with any number larger than 3).

@monsterrev , what this is doing is creating an array and updating that array with a lock, we then move on to the next index %3 to be able to store the new diff there. The barrier in red makes sure all processors have updated diff[index] and thus the comparison will be accurate.

EggyLv999

@acortes

The size 3 array is a little trick that we can use to reduce memory usage. First, think of the diff array as an array of size n instead of 3, and each space holds the value of diff at a certain timestep, starting from 0. It turns out that we only need to look up to 1 timestep back, so we really only need to store the most recent two values of diff. So we can cut down the size of the array by just keeping 3 elements and rotating through them, which is what (index+1)%3 is doing.