Parallel Computer Architecture and Programming (CMU 15-418/618)

This page contains lecture slides, videos, and recommended readings for the Spring 2015 offering of 15-418/618. The full listing of lecture videos is available on the Panopto site here.

Further Reading:
(forms of parallelism + understanding Latency and BW)
Further Reading:
(and their corresponding hardware implementations)
(the thought process of parallelizing a program)
(CUDA programming abstractions, and how they are implemented on modern GPUs)
Further Reading:
(the tension between achieving good work balance and minimizing the overhead of making the assignment)
(techniques for reducing communication and contention, inherent vs. artifactual communication)
(a few examples of parallelizing algorithms)
(evaluating program performance, how to scale performance analysis "up" and "down")
(the basics of cache coherence, the MSI and MESI protocols)
(evaluating the performance of snooping implementations, coherence in a multi-level cache hierarchy)
(why directories enable scalable cache coherence, reducing the overhead of directory storage)
(the motivation for and implications of relaxed consistency memory models)
(scale-out parallelism, elasticity, and significant amounts of caching)
Further Reading:
(the challenges of implementing invalidation-based coherence in a real system)
(challenges of fine-grained locking, basics of lock-free data structures)
(network-on-a-chip topologies and flow-control algorithms)
(motivation for transactional memory, the design space of implementations)
(the basics of Cilk's locality-aware, work-stealing scheduler)
Further Reading:
(area and energy-efficient computing via heterogeneous parallel processors)
Further Reading:
(examples from GraphLab, Ligra, and Green-Marl, discussion of what makes a good programming system)
(parallelization issues in modern databases, a lecture by Andy Pavlo)
(the RDD abstraction and how it enables efficient, distributed processing)
(how DRAM works and modern hardware approaches to improving locality and bandwidth)
(triangle rasterization as a sampling problem, parallel rasterization, HW z-buffer compression)
This lecture was a bonus lecture and was not recorded.
(Exam 2 review, how to give a good talk, course summary)