Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2015

Previous | Next --- Slide 44 of 69

msebek

What is the advantage of having multiple, small L2 caches connected by a crossbar switch, versus one L2 cache per processor? Am I interpreting the picture incorrectly?

ak47

Also, you mentioned this is size n^2. But does the number of processors necessarily need to match the number of caches? With the crossbar in place there seems to be no point in a 1-to-1 match.

@msebek Having the L2 caches connected by a crossbar allows a processor to directly access the cache of another processor.

msebek

@kk, I guess I'm curious about L2 caches vs. L3 caches. L2 is one per core, whereas L3 is unified, with one L3 cache per processor. What is the advantage of having a non-unified L2 cache over a unified L2 cache?

I did some googling, and I saw the following explanation. Any prof./TA to confirm/deny?

The L2 cache is shared between one or more L1 caches and is often much, much larger. Whereas the L1 cache is designed to maximize the hit rate, the L2 cache is designed to minimize the miss penalty (the delay incurred when an L1 miss happens).

For chips that have L3 caches, the purpose is specific to the design of the chip. For Intel, L3 caches first made their appearance in 4 way multi-processor systems (Pentium 4 Xeon MP processors) in 2002. L3 caches in this sense greatly reduced delays in multi-threaded environments and took a load off the FSB. At the time, L3 caches were still dedicated to each single core processor until Intel Dual-Core Xeon processors became available in 2006. In 2009, L3 caches became a mainstay of the Nehalem microprocessors on desktop and multi-socket server systems.

ESINNG

For a cache using by all cores, they can share the data in the cache. Think this situation, when a thread is running on core1 and later it was switched out and later it run on core2. If there is no such cache shared by all cores, then it has to fetch the data from the memory because of the cache miss. But with a cache shared by all cores, it can just fetch the data from the cache which improve the speed. @msebek, I think your explanation is good.