A real life experience of this is when I took compilers and did not fully optimize my register allocator which resulted in stack memory being used instead of registers to store local variables. The performance deficit was pretty big. Programs could run up to 15x slower under such circumstances. In my case, this made the difference between a full score on the lab and losing a few points.
register allocation is very very important... :'(
Looking at the Core's in the slide, there is the L1, L2, and L3 cache. What is the purpose of having this multi-level cache within the CPU over a single level? Does it also affect the design of how to cache?
@acfeng in order of speed, from fastest to slowest (both in bandwidth and latency) (recap from 213):
The purpose of multi level caching is because the L1 is faster, but smaller than the L2, etc. We want to bring in as much as possible, as close as possible, but we are limited in size and/or speed so we use multiple levels to work around these tradeoffs. L1 is very expensive to make in terms of resource usage at the hardware level. L3 is much cheaper.