Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2015

A Modern Multi-Core Processor

Previous | Next --- Slide 42 of 66

Back to Lecture Thumbnails

zhanpenf

https://software.intel.com/en-us/forums/topic/287236

Take Intel i7 for an example. Here are the approximate latencies to access L1, L2, L3 caches and main memory:

L1 CACHE hit, ~4 cycles

L2 CACHE hit, ~10 cycles

L3 CACHE hit, line unshared ~40 cycles

L3 CACHE hit, shared line in another core ~65 cycles

L3 CACHE hit, modified in another core ~75 cycles

remote L3 CACHE ~100-300 cycles

haodongl

Is more levels of cpu cache design simply better? Will we see a L4 cache appear in the future?

ankit1990

Found this on the internet:- http://hardware.slashdot.org/story/13/11/23/0137253/intels-128mb-l4-cache-may-be-coming-to-broadwell-and-other-future-cpus

yuel1

I think I understand what L3 line unshared, shared line in another core, and modified in another core mean, but what does remote L3 CACHE refer to in terms of hardware?

kayvonf

@yuel1: In Intel's modern "Core" architecture, the L3 cache, although it is a logically shared cache assessable by all cores as drawn here, the L3 cache SRAM is physically partitioned among the cores (implementation detail). Therefore, the L3 cache has non-uniform access characteristics. In a quad-core chip, that 8MB L3 is made up of 4 2MB slices (or "banks") in each of the cores. A processor's L3 hit latency is lower if the address is in the local slice of the L3. It's a very modular design.

yuel1

@kayvonf: I think I was a little vague in my question. What struck me as odd was the remote L3 Cache that took 100~300 cycles. Is that in reference to L3 cache slices that are not immediate neighbors of each other?