Suppose for simplicity we have 2^k elements. Then we iterate from i = 0 to i = k - 1, and on each iteration for each index j set A[j] += A[j - 2^i] (as long as not out of bounds).
This means we maintain the invariant that at the end of iteration i the element at position j is equal to sum(j - 2^(i+1) + 1, j), the sum of all elements from index j - 2^(i+1) + 1 to index j (cap the lower bound at 0).
To see why this is inductively true: at start of iteration i, A[j] = sum(j - 2^i + 1, j) (right after end of iteration i - 1). Now we set A[j] as follows:
As desired by end of iteration i. This holds true for any operation, not just sum.
Cake
Shouldn't this only hold for associative operations? I might be forgetting my 210 here..
eourcs
Yes, this scan implementation would only be equivalent to a sequential scan implementation for associative operations. In a sequential implementation, A_0-2 = (((A_0) + A_1) + A_2), but in this implementation, A_0-2 = (A_0 + (A_1 + A_2)). This is kind of interesting to note since floating point addition is not associative, so this implementation would not be correct (well, maybe close enough) for floating point sum!
This algorithm works as follows:
Suppose for simplicity we have 2^k elements. Then we iterate from i = 0 to i = k - 1, and on each iteration for each index j set A[j] += A[j - 2^i] (as long as not out of bounds).
This means we maintain the invariant that at the end of iteration i the element at position j is equal to sum(j - 2^(i+1) + 1, j), the sum of all elements from index j - 2^(i+1) + 1 to index j (cap the lower bound at 0).
To see why this is inductively true: at start of iteration i, A[j] = sum(j - 2^i + 1, j) (right after end of iteration i - 1). Now we set A[j] as follows:
A[j] = sum(j - 2^i + 1, j) + A[j - 2^i]
= sum(j - 2^i + 1, j) + sum(j - 2^i - 2^i, j - 2^i)
= sum(j - 2^i + 1, j) + sum(j - 2^(i+1), j - 2^i)
= sum(j - 2^(i+1), j)
As desired by end of iteration i. This holds true for any operation, not just sum.
Shouldn't this only hold for associative operations? I might be forgetting my 210 here..
Yes, this scan implementation would only be equivalent to a sequential scan implementation for associative operations. In a sequential implementation, A_0-2 = (((A_0) + A_1) + A_2), but in this implementation, A_0-2 = (A_0 + (A_1 + A_2)). This is kind of interesting to note since floating point addition is not associative, so this implementation would not be correct (well, maybe close enough) for floating point sum!