Previous | Next --- Slide 8 of 57
Back to Lecture Thumbnails
418_touhenying

It seems that it was not discussed, or I missed, how these program instances would be dealt with, like in what order, whether or not actually in parallel. So how would these details go?

Also are these decisions by OS or directly by hardware? Thanks.

pkoenig10

The purpose of these slides was to separate the abstraction from the implementation. This slides discusses the abstraction that ISPC uses and the mental model to use when writing ISPC code. You are asking questions about that actual implementation, which is not the focus here.

That said, it is my understanding that ISPC code is compiled to use specific SIMD instructions (for example https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions). The processor knows how to execute these SIMD instructions (as it would any other x86 instruction) using its SIMD ALU. The OS has nothing to do with how individual instructions are executed; all it does is load the instructions from memory and pass them along to the processor.

krombopulos_michael

So SIMD and SPMD seem to both allow parallel computation on independent data, albeit with different methods. Should we expect that a 4 vector wide SIMD execution of a program and a SPMD execution to speed up a program by the same amount?

colorblue

@krombopulos_michael

This is mostly dependent on what the problem is. If the program will gain a large advantage from more instruction streams then SPMD is the way to go.

However if the program hypothetically is optimized for both SIMD and SPMD and the machine supports both equally then the speedup will be greater with SIMD due to the simplicity of the SIMD to the architecture while SPMD can be expensive to set up.

narainsk

Can ISPC functions use concurrent mechanisms, e.g. Pthreads?

fleventyfive

@narainsk: That is precisely the main difference between the abstraction and implementation! We would never know how an "abstraction" is "implemented", unless we look into the code or documentation! So to answer your question, ISPC functions may use Pthreads for their implementation, or might use another implementation, like perhaps OpenMP (which is again a parallelization abstraction, and may use pthreads for its implementation)!

MaxFlowMinCut

@colorblue It is worth noting that it's important to realize that SPMD is an abstraction, and should not be considered "expensive to set up" precisely because it's abstracted away from any particular implementation. Indeed, in the context of ISPC, our SPMD model is that of a gang of instances that are launched when an ISPC function is called. The implementation, however, (assuming tasks are not launched) is that the compiler takes what is basically C code and outputs SIMD vector instructions. In other words, compiling C into SIMD is the implementation of the SPMD programming model of ISPC, so there is no real distinction between being optimized for SIMD and SPMD in this case.