Parallel Computer Architecture and Programming (CMU 15-418/618)

This page contains lecture slides, videos, and recommended readings for the Spring 2017 offering of 15-418/618. The full listing of lecture videos is available here.

(motivations for parallel chip decisions, challenges of parallelizing code)
Further Reading:
(forms of parallelism + understanding Latency and BW)
Further Reading:
(ways of thinking about parallel programs, and their corresponding hardware implementations)
(the thought process of parallelizing a program)
(CUDA programming abstractions, and how they are implemented on modern GPUs)
Further Reading:
(achieving good work distribution while minimizing overhead, scheduling Cilk programs with work stealing)
(message passing, async vs. blocking sends/receives, pipelining, increasing arithmetic intensity, avoiding contention)
(examples of optimizing parallel programs)
(hard vs. soft scaling, memory-constrained scaling, scaling problem size, tips for analyzing code performance)
Further Reading:
(definition of memory coherence, invalidation-based coherence using MSI and MESI, maintaining coherence with multi-level caches, false sharing)
(scaling problem of snooping, implementation of directories, directory storage optimization)
Further Reading:
(deadlock, livelock, starvation, implementation of coherence on an atomic and split-transaction bus)
(consistency vs. coherence, relaxed consistency models and their motivation, acquire/release semantics)
(scale out, load balancing, elasticity, caching)
Further Reading:
(network properties, topology, basics of flow control)
(machine-level atomic operations, implementing locks, implementing barriers)
(fine-grained snychronization via locks, basics of lock-free programming: single-reader/writer queues, lock-free stacks, the ABA problem, hazard pointers)
Further Reading:
(motivation for transactions, design space of transactional memory implementations, lazy-optimistic HTM)
Further Reading:
(energy-efficient computing, motivation for heterogeneous processing, fixed-function processing, FPGAs, what's in a modern SoC)
(GraphLab, Ligra, and GraphChi, streaming graph processing, graph compression)
Further Reading:
(intro to deep networks, what a convolution does, mapping convolution to matrix multiplication, deep network compression)
Further Reading:
(basics of gradient descent and backpropagation, memory footpring issues, asynchronous parallel implementations of gradient descent)
(how DRAM works, cache compression, DRAM compression, upcoming memory technologies like 3D stacking)
(supercomputing vs. distributed computing/analytics, design philosophy of both systems)
(Guest lecture by Andy Pavlo)
(concurrency control in databases, transactions, two-phase locking, timestamp ordering)
(tips for giving a clear talk, a bit of philosophy)
Further Reading: