Previous | Next --- Slide 33 of 65
Back to Lecture Thumbnails
acappiello

Although we were able to use all of these methods with our sinx example, none of them come off as a "perfect" method for taking advantage of SIMD. Intrinsics likely gives you the most precision to optimize the code, but puts a lot more work on the programmer and won't work with processors that don't support the instructions. Similar to "Kayvon's fictitious data-parallel language," MATLAB (though not a compiled language) has a parfor construct like used here with forall. The tradeoff when using parfor is that the interpreter needs to be sure that iterations are, in fact, independent, so it places restrictions on changing other variables in the loop body. As already stated, inferring parallelism likely will not be detected for intricate programs.

I guess the takeaway message is that we discussed so many different methods for making this one example (sinx) parallel because there isn't one right answer and knowing a variety of strategies is necessary to write effective parallel code.