Here is a very fast implementation of up-sweep and down-sweep that also implements multi-scan - performing multiple related scans in one pass. (more info on page 13 of this pdf)
This comment was marked helpful 0 times.
This code suffers from poor locality and the performance can be improved using a segmented scan. In a segmented scan, we break up the input array into small contiguous blocks and perform a scan of each of them. Though it performs more work, it can lead to better performance due to better utilization of the cache.