Do we need 3 copies? As far as I can tell, only 2 of the indices are used between two iterations, and each iteration will have to complete before continuing because of the barrier.
@rmanne Suppose we have two programs that get to the barrier.
Program 1 has index i, and then it pauses before the if statement after the barrier.
Program 2 continues on, and its index is now i+1.
Program 2 runs until it hits the barrier, so it has accessed diff[i+1], and diff[i+1+1].
Program 1 now resumes, and accesses diff[i].
Hence, 3 copies.
If we had only 2 copies, Program 2 could potentially run until the barrier again, and then Program 1 might call break on the tolerance check when it was not supposed to.
OH, he use a window for 'diff'...
i think the reason we have an array of size 3 is we need 3 variables to remove dependency: #1 for current computation; #2 for stalled 'threads'; and #3 for fast 'threads'