Previous | Next --- Slide 33 of 59
Back to Lecture Thumbnails
kayvonf

Question: This slide states that the work-efficient version of scan would require more instructions than this version. Why is this the case?

ericwang

In this context, we have 32-wide SIMD execution for 32 elements. So one instruction can finish one round calculation for all 32 elements. So around lg(N) instructions would be enough for this implementation.

Then in work-efficient version, SIMD execution can still handle maximum 32 elements one time. But up-sweep plus donw-sweep will require 2*lg(N) rounds. Actually this version's implementation never uses more than 50% SIMD capability. The utilization is about half of the one in this slide.

As a result, more than 2x instructions would be needed.