Previous | Next --- Slide 12 of 48
Back to Lecture Thumbnails
taz

Most automatic decomposition approaches just look for loops that can run independently in parallel, while minimizing the amount of synchronization overhead. The primary constraint on parallelism is that certain accesses to shared variables must be explicitly serialized to ensure that the parallel program is equivalent to the original program. One approach is to break the original program into the parallelizable loops and serial blocks, and then insert barrier synchronization points as necessary.

crow

Here is an article on how gcc compiles code to take advantage of vector instructions. It is quite difficult to get it working properly!

https://locklessinc.com/articles/vectorize/

cwchang

So one inspiration from this class is the skill of identifying "decomposable" tasks. Sometimes we thought some algorithm is inherently sequential and can't be parallelized. Yet, if we look more closely, we might find that in the lower level computation (such as some inner loop), the tasks are decomposable. That way, we might be able to achieve partial optimization.

bharadwaj

Identifying smart ways to decompose problems to exploit hardware that runs in parallel has been the biggest skill we've learned so far. Decomposition appears to be hard to do for parallel frameworks (such as ISPC) but I'm interested in knowing how well existing automated approaches actually perform.