Previous | Next --- Slide 27 of 63
Back to Lecture Thumbnails
taoy1

Does anyone know which method (this slide or the next slide) will perform better? As their up-sweeping and down-sweeping seem to be the same.

I think the difference is that the method of this slide has one barrier in the middle (P2 must wait for P1 to finish computing $a_{0-7}$ and take that value before it continues to down-sweep).

While in the next slide, P1 and P2 do sequential scan seperately. At last they do a synchronization and P1 add $a_{0-7}$ to elements 8-11, P2 add $a_{0-7}$ to elements 12-15.

So which one has better performance or they are almost the same?