Both up-sweep phase and down-sweep phase have N work. So the work complexity should be $\Theta (2N)$. Similarly the span complexity should also be $\Theta (2\lg N)$. In my opinion, this algorithm does not have good locality because we don't use continuous elements in the array.

grizt

It's pretty nice that the constants are relatively low (no Fibonacci heap madness here). It's easy to see that both the upsweep and the downsweep have log(n) levels each, so the span is 2 log(n). The work is a bit more complicated: at level i in the upsweep we touch n/(2^i) elements, and there are almost exactly log(n) levels, so the work done at the upsweep is 2n. And similarly for the downsweep, the work done is 2n. So the total work is 4n.

mallocanswer

@grizt. Why should the work of upweep be 2n? I agree that at level i we touch $\frac{n}{2^{i}}$. For simplicity, we assume n is a power of 2, let's say $2^{k}$. So the whole work would be $n\times(\frac{1}{2}+\frac{1}{2^{2}}+...+\frac{1}{2^{k}}) = n*[1-(1/2)^{k}]=n$

grizt

@mallocanswer, at the first level we touch all $n$ elements. So your sum inside of the parenthesis is missing the first term, and should look like $n (1 + \frac{1}{2} + \dots + \frac{1}{2^k})$ so the geometric sum is upper bounded by 2.

mallocanswer

@grizt Ohh, I see your point. I just do the algorithm in place so do not count the first level into my calculation. Thanks for your clarification.

grizt

@mallocanswer, the reason I included it was because I saw it accessed all $n$ pieces of memory in the first step, even though it only performed $\frac{n}{2}$ addition operations. It's weird to say "the constant factor is exactly 4" or "exactly 2" because we know that there will be other factors affecting this code's performance (all models are wrong, some are useful), but I thought that memory requests (even if they are all cache hits) would require at least one cycle and so they would be very relevant.

Both up-sweep phase and down-sweep phase have N work. So the work complexity should be $\Theta (2N)$. Similarly the span complexity should also be $\Theta (2\lg N)$. In my opinion, this algorithm does not have good locality because we don't use continuous elements in the array.

It's pretty nice that the constants are relatively low (no Fibonacci heap madness here). It's easy to see that both the upsweep and the downsweep have log(n) levels each, so the span is 2 log(n). The work is a bit more complicated: at level i in the upsweep we touch n/(2^i) elements, and there are almost exactly log(n) levels, so the work done at the upsweep is 2n. And similarly for the downsweep, the work done is 2n. So the total work is 4n.

@grizt. Why should the work of upweep be 2n? I agree that at level i we touch $\frac{n}{2^{i}}$. For simplicity, we assume n is a power of 2, let's say $2^{k}$. So the whole work would be $n\times(\frac{1}{2}+\frac{1}{2^{2}}+...+\frac{1}{2^{k}}) = n*[1-(1/2)^{k}]=n$

@mallocanswer, at the first level we touch all $n$ elements. So your sum inside of the parenthesis is missing the first term, and should look like $n (1 + \frac{1}{2} + \dots + \frac{1}{2^k})$ so the geometric sum is upper bounded by 2.

@grizt Ohh, I see your point. I just do the algorithm in place so do not count the first level into my calculation. Thanks for your clarification.

@mallocanswer, the reason I included it was because I saw it accessed all $n$ pieces of memory in the first step, even though it only performed $\frac{n}{2}$ addition operations. It's weird to say "the constant factor is exactly 4" or "exactly 2" because we know that there will be other factors affecting this code's performance (all models are wrong, some are useful), but I thought that memory requests (even if they are all cache hits) would require at least one cycle and so they would be very relevant.