It seems like from lecture that SIMD considerably improves performance for loop iterations that can be run independently. Are there any other situations where SIMD is beneficial?
@rjvani SIMD, from what I understood from lecture is useful in situations where the ratio of non-memory access instructions to memory access instructions is high in which case the memory access latency is not a bottleneck for this architecture. One example is the work handled by GPUs because of the presence of parallel intensive arithmetic computations without conditional branches and comparatively less memory accesses. I found a (slightly dated) slide deck from 15-462 with helpful and detailed information about the working of a GPU : https://www.cs.cmu.edu/afs/cs/academic/class/15462-f11/www/lec_slides/lec19.pdf
Here in this fictious data-parallel language, you can declare that the iterations of the loop are independent. Are there real-world examples of languages that have this kind of baked-in concurrency primitives ? I was thinking of Go's goroutines and channels.
@dmerigou Check the Cilk program. With the help of compiler, we can dynamically assign threads to run independent loops.
@dmerigou MATLAB also has parallelism facilities that allow for this. Something that you might want to look into is MATLAB's parfor construct; it essentially allows you to run iterations of a loop in parallel.