Previous | Next --- Slide 6 of 52
Back to Lecture Thumbnails
kayvonf

Question: I want to see someone take a crack at describing the semantics of the call to the ISPC function sinx here. What does ISPC programming model dictate this call does? (I'm not talking about implementation, I'm talking semantics)

aaguilet

I feel like I'm just repeating the slide, but here I go: Semantically, the call to the ISPC function will create a group of program instances, which will execute code in parallel (in this case, calculate the sin function for an array of floating point values). Upon completion, the thread that called the ISPC function will resume execution. Any hint on how can I expand?

kayvonf

@aaguilet: No problem at all repeating a slide in your own words. Good work.

However, perhaps to convince you that the exercise is not as trivial as it seems, I'll again repeat the idea of the slide with even more detail. The call of the ISPC function sinx launches a gang of programCount program instances that run concurrently---although as Matt points out, the semantics essentially imply run in parallel---(a "group of program instances" is called a gang). Each instance in a gang has a different programId. When all instances complete execution of the ISPC function sinx then control returns to the calling C code. (just saying "upon completion is vague"... upon completion of what?).

selena731

I'm not a 100% sure on this, but I'd guess that "completion" refers to the program counter (not the programCount ISPC variable, but the instruction PC) reaching the end of and returning from the sinx subroutine. Copy pasting from the ISPC Execution model guide:


"The ispc execution model provides an important guarantee about the behavior of the program counter and execution mask: the execution of program instances is maximally converged. Maximal convergence means that if two program instances follow the same control path, they are guaranteed to execute each program statement concurrently."


And that is assuming that there is conditional control flow in the code. But in this case, sinx does not have any conditionals, so essentially the execution mask for each instruction will be all 1's.

In regards to for loops, here's another copy-pasted example from the guide:


"Consider for example an image filtering operation where the program loops over pixels adjacent to the given (x,y) coordinates:

float box3x3(uniform float image[32][32], int x, int y) {
    float sum = 0;
    for (int dy = -1; dy <= 1; ++dy)
        for (int dx = -1; dx <= 1; ++dx)
            sum += image[y+dy][x+dx];
    return sum / 9.;
}

In this case, a sufficiently smart compiler could determine that dx and dy have the same value for all program instances and thus generate more optimized code from the start, though this optimization isn't yet implemented in ispc. In particular, ispc can avoid the overhead of checking to see if any of the running program instances wants to do another loop iteration. Instead, the compiler can generate code where all instances always do the same iterations."


So since in sinx the loop variable j is uniform, technically ISPC will generate code so that all program instances will execute the same exact instructions (concurrently). Although, not sure about the whole "this optimization isn't yet implemented in ispc."

I went off on a tangent there, but to go back to @kayvonf 's question, based on how the execution model is described in that guide, it seems that ISPC will execute sinx step-by-step across all program instances. Should there be something that keeps track of how many more gangs must be made to finish processing all data? (i.e. the most trivial way I can think of is something like

//ISPC part of assembly starts here
gangs = total data / programCount;
for (int i = 0; i < gangs; i++)
  //SPMD call
//ISPC part of assembly ends here

)

kayvonf

@selena731. Excellent thinking in your post. (Class this is a good example of digging in). Now that you've done that, here are some suggested fixes/clarifications to your post.

  • By "completed" I believe @aaguilet and I were simply referring to completion (termination) of the program instance. That is, when all program instances in the gang complete execution of the logic in ISPC function sinx, the gang itself is complete and the ISPC function then returns to the calling thread of C code.

  • Your statement "it seems that ISPC will execute sinx step by step across all program instances" is completely correct. However, ISPC won't do anything to compute how many gangs are needed. There are only two ways to create a gang of program instances in ISPC:

  1. By calling the IPSC function from C code (sinx in this case). In this case there is one gang per call.
  2. By using launch to create tasks, which are executed by a gang. In this case there is one gang per task.