Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

chuangxuean

A real life experience of this is when I took compilers and did not fully optimize my register allocator which resulted in stack memory being used instead of registers to store local variables. The performance deficit was pretty big. Programs could run up to 15x slower under such circumstances. In my case, this made the difference between a full score on the lab and losing a few points.

vincom2

register allocation is very very important... :'(

acfeng

Looking at the Core's in the slide, there is the L1, L2, and L3 cache. What is the purpose of having this multi-level cache within the CPU over a single level? Does it also affect the design of how to cache?

hofstee

@acfeng in order of speed, from fastest to slowest (both in bandwidth and latency) (recap from 213):
Registers
L1
L2
L3
Memory
Storage
Web

The purpose of multi level caching is because the L1 is faster, but smaller than the L2, etc. We want to bring in as much as possible, as close as possible, but we are limited in size and/or speed so we use multiple levels to work around these tradeoffs. L1 is very expensive to make in terms of resource usage at the hardware level. L3 is much cheaper.