Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

A Basic Snooping-Based Multi-Processor Implementation

Previous | Next --- Slide 8 of 66

1pct

Contents of L1 are a strict subset of contents in L2 under the following circumstance: Considering L2 is larger than L1 and L1 is full, the processor tries to access X which is not in L1, which will result in a eviction in L1 and both L1 and L2 add X to their cache lines.

ote

I am still confused on the benefits of the inclusion property of caches. It seems a little wasteful to store the same information in both the L1 and L2 caches. On the previous slide it says that all caches snooping the interconnect independently is inefficient-how does inclusion mitigate that?

bpr

It is a little wasteful to duplicate the data. AMD processors have in the past used exclusive caches, which is an alternative to inclusion. However by requiring the L1 to snoop, it must also have the appropriate logic and mechanisms to understand and respond to bus transactions. Furthermore, when a transaction requires all processors to respond, now both the L1 and L2 for each processor has to indicate its response appropriately, thus requiring more time / energy. These are just two of the many reasons in favor of the inclusion property.

MaxFlowMinCut

@ote It's true that you pay a memory penalty by adhering to the inclusion property, but note that it also adds a lot of simplicity and possibly performance to the caching model as well. Kayvon mentioned in class that one approach to tracking writes in the L1 cache line could be to write a dirty bit on the L2 cache line that says whether it needs to flush the bits from the L1 cache when it invalidates and/or flushes the line. If we didn't have inclusion then we would need a different approach with more logic (performance overhead) and more storage (memory overhead) to lookup cache changes in the L1 cache on an eviction. Inclusion doesn't have this drawback.

In summary: The inclusion principle simplifies the L1 - L2 caching model and helps performance by giving the L2 cache a simple and fast way to check whether the L1 cache line has been modified, at the expense of redundant cache line storage between L1 and L2.

jaguar

Are there situations in which it would be better to abandon the inclusion principle to gain back that extra cache storage?

mallocanswer

@jaguar, actually exclusive cache hierarchy could have better performance and higher capacity in general cases. You can get some insights from this paper http://www.cse.iitk.ac.in/users/mainakc/pub/isca2011.pdf