I think the code given in this page is inclusive scan
But this code would not scale to different number of SIMD units, does it?
A common theme seems to be that at certain scales it's faster to use a asymptotically slower algorithm because of constant costs.
@mallocanswer: its called exclusive because it returns the exclusive result. (Although the inclusive result is what's stored in memory.)
I think the code given in this page is inclusive scan
But this code would not scale to different number of SIMD units, does it?
A common theme seems to be that at certain scales it's faster to use a asymptotically slower algorithm because of constant costs.
@mallocanswer: its called exclusive because it returns the exclusive result. (Although the inclusive result is what's stored in memory.)