Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

Previous | Next --- Slide 4 of 79

testuser

The four (I guess?) key concepts summarized:

(from slide 41:) Modern computers are parallelized in a few ways:
- via multiple processing cores, and
- via multiple ALUs with a core. The former allows for multi-thread parallelism, while the latter is good for tasks with "data-parallel workloads" (i.e. accessing the same index of multiple vectors as one iteration of a loop).
Memory access is often a bottleneck for parallelized tasks. The example from slide 65 demonstrates how a large problem can be efficiently parallelized, but performance is thwarted by the limited bandwidth of the memory bus, even on a GPU. To overcome this, a well-parallelized program will (from slide 67:)
- minimize the number of memory accesses, reusing data within and between threads, and
- prioritize arithmetic over memory requests when possible, since ALUs aren't exactly a limited resource.