Previous | Next --- Slide 31 of 65
Back to Lecture Thumbnails
askumar

I was still a little unclear about what happens with exceptions or errors if both branches of the conditional are executed. If we don't have a way to handle or avoid this, how do we use this method? Does the language have to provide a way to deal with exceptions?

Xiao

I will use Intel's SIMD execution as an example. After compilation, the program is actually still sequential. There is a mask register which determines which lanes will receive side effects. Hence, if the mask bit of a lane is off, running through a branch does effectively nothing. In this case, the sample code would be translated into something along the lines of

SET_MASK(x > 0)    // If the lane's x is > 0, then set mask as 1, else 0

y = exp(x, 5.f);   // Every lane still performs these instructions
y *= Ks;           // but only matters for those with the mask on
relf = y + Ka;

NEGATE_MASK        // Do same thing for the other lanes

x = 0;
relf = Ka;

So to the processor, it looks like a normal program. For further information, here is how ISPC does it.

Now, in the case of a processor where every thread/warp could be doing something differently, this task becomes much harder, and it sort of boils down to how precise you want your exception handling to be. Saving the context of every one of the 1000+ threads and passing them to an exception handler sounds quite unrealistic. A way to do it is to have the processor remember checkpoints, which are states where execution was known to be correct, and fall back to the last checkpoint when it goes boom. Of course, this means exception handling will not be as precise as a CPU's exception unit, but this method can grant consistent recovery at a reasonable cost. Here is an ACM paper that explores this issue in more detail.

alex

In my (albeit limited) experience with languages that support these semantics, most don't have a high level mechanism for exceptions. I believe that most will only cause an exception if an active lane would error. For example, if the size of an array is not a even multiple of the number of lanes it is common practice to run all lanes but mask away any lanes that would be accessing invalid data.

Perhaps @kayvonf can offer some more insight?

martin

I'm still a little confused about this example. As shown by the graph above, it seems that ALU1, ALU2, and ALU4 are executing the true branch in parallelism taking advantage of SIMD, and afterwards the rest of the ALUs are executing the operations in the false branch. But I don't get why both branch can be executed in one loop iteration. Also, what are the rest of the ALUs working on (indicated by the red cross) while ALU1,2,4 are working on the true branch? Can someone clarify a little bit more for me?

fangyihua

@martin ALU 1-8 are all executing the same instruction, but only ALU 1,2,4 allows to write the result into register or memory. The rest of ALUs do not stop working; since the mask can disable ALUs from writing, it seems that they are not doing any work.