Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

Previous | Next --- Slide 26 of 59

thomasts

I'm O.K. with the upsweep phase -- it reminds me of the construction of a Fenwick tree. But is there any intuition as to what the downsweep phase is doing? It seems slightly miraculous that placing a zero in the final entry and swapping things around gets the final answer. By looking at the arrows, it's clear the the downsweep works, but it's not clear to me what the inspiration to this algorithm is.

hofstee

@thomasts you can read the paper here, but essentially what is happening is we're reordering and processing a tree.

jhibshma

@thomasts -- I think the heart and soul of scan is the following statement: If we have the completed partial sums for every-other index, then it's easy to get the partial sums for the whole array.

Say we have the partial sums for the odd indices: p1, p3, p5, etc. Then the 2nd partial sum p2 = p1 + a2. Similarly, p4 = p3 + a4, p6 = p5 + a6, and so on.

The magic of scan is its recursive logic: Once we've added a0 to a1, a2 to a3, etc., we can call scan on that set of sums [a0-1, a2-3, ..., a(n-1)-n] and get back the partial sums for the odd indices.

This picture illustrates an in-place version of the algorithm, and in this case it's illustrating exclusive scan (p0 lands in index 1, p1 in index 2, etc). Thus, in exclusive scan, we don't care about the sum of the whole array (p15 = p14 + a15).

Consider the inserting of 0 just below the dotted line to be a preparation step. Then, on the row below it, we get the final answer for the exclusive scan of [a0-7, a8-15]. Namely, the answer is [0, a0-7]. After that, we descend one more layer and get the exclusive scan of [a0-3, a4-7, a8-11, a12-15]; namely, [0, a0-3, a0-7, a0-11]. This continues until we get the full exclusive scan.

Hopefully that's a helpful explanation.