Slide View : 15-418 Spring 2013

Previous | Next --- Slide 37 of 65

kayvonf

For the architecture wonks:

realworldtech.com tends to have technical, but quite accessible, articles that describe the architecture of modern processors. This article on Intel's Sandy Bridge architecture is a good description of a modern CPU. It's probably best to start at the beginning, but Page 6 talks about the execution units.

This comment was marked helpful 0 times.

rutwikparikh

An interesting question to think of would be "What sorts of programs would run well using multiple cores but not in SIMD?" If we consider a code example with a switch statement where one of the cases dominates most of the work to be done by the code, one instance in SIMD will be busy whereas the other instances will be masked; resulting in performance loss. However, using threads in multi-core, we can distribute work so that most threads are busy on the work-dominating case and the other cases are executed instantly by threads given the nature of the switch statement.

This comment was marked helpful 1 times.

mmp

@rutwikparikh I think that both that question and the converse--what sorts of programs run well in SIMD but not with multiple cores--are really interesting ones. Pondering them both gives a lot of useful insight about HW architectures and parallelism.

On the "what runs well with SIMD but less well with multiple cores front": at first, one might think that multiple cores would always be preferable, since that's MIMD, and thus more general than SIMD (and handles cases like divergent control flow well, as you note). However, the nice thing about SIMD is that communication between program instances in a gang (ISPC) or threads in a warp (CUDA) is basically completely free. Communication between cores is much slower, especially if it has to go through a cache coherence protocol. Further, the fact that a gang or a warp is all running together means you don't need the same amount of synchronization you do across cores. So for communication-heavy computation, SIMD can be a better target.

Another issue is the overhead of starting parallel computation. With SIMD, you can be deep in some function operating on some data and opportunistically apply SIMD. There's essentially nil overhead for the transition (on a CPU at least). To launch work across CPU cores, there will unavoidably be higher overhead to enqueue work in a task system or the like.

Finally (related), there's the granularity at which it's worth going parallel. Launching tasks on other cores isn't worth it for a small amount of work, but again SIMD can be applied to many little problems pretty easily. (But on the other hand, there's always Amdahl's law, so a little SIMD in the middle of a big program doesn't necessarily buy you anything.)

Anyway, it's interesting stuff to think about.

This comment was marked helpful 1 times.