Previous | Next --- Slide 27 of 64
Back to Lecture Thumbnails
apk

Constants:

  • Work - Around 3, since there are 2 adds and 1 copy.
  • Span - 2, since there are upsweep and downsweep (lgN+lgN)
haoala

While the work-efficient scan algorithm has better work (asymptotically) than the one two slides ago (O(N) < O(N lg N)), its constant for span is larger by a factor of approximately two, since the upsweep and downsweep stages each have similar span to that of the previous algorithm.

Metalbird

I think the locality is fairly good for this algorithm. At least in the first step of the up-sweep and the last step of the down-sweep, values are being added to values near them in storage. During the middle iterations, there may be some locality issues if we have to bring a whole new line into the cache for every value that we would need, but I may be mistaken on this point.

cluo1

This algorithm has less work to do but a larger span: O(2* lgN). If the array is small enough so that all the elements can be concurrently executed on GPU, the run time will be dominated by span. As the result, this algorithm will run slower.

sadkins

The recursive implementation of scan has inefficient work compared to the looped implementation, though it has logarithmic span. Span calculations assume an infinite number of processors(which isn't practical) and don't consider overhead communication costs. Therefore good span does not necessarily mean the algorithm will perform well in practice