Previous | Next --- Slide 4 of 79
Back to Lecture Thumbnails

The four (I guess?) key concepts summarized:

  • (from slide 41:) Modern computers are parallelized in a few ways:
    • via multiple processing cores, and
    • via multiple ALUs with a core. The former allows for multi-thread parallelism, while the latter is good for tasks with "data-parallel workloads" (i.e. accessing the same index of multiple vectors as one iteration of a loop).
  • Memory access is often a bottleneck for parallelized tasks. The example from slide 65 demonstrates how a large problem can be efficiently parallelized, but performance is thwarted by the limited bandwidth of the memory bus, even on a GPU. To overcome this, a well-parallelized program will (from slide 67:)
    • minimize the number of memory accesses, reusing data within and between threads, and
    • prioritize arithmetic over memory requests when possible, since ALUs aren't exactly a limited resource.