Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

Previous | Next --- Slide 43 of 51

yangwu

why M is number of L3 cache lines in this example? if for every core, it only needs to keep track of memory that is host by itself, then M should be number of L2 cache lines?

Hil

@yangwu Here the directory belongs to L3 cache but not L2 cache and it must have entries for all lines in the four L2 caches. This directory keeps track of cache line's state in every L2 cache so here P is 4 and M is the number of L3 cache lines (it contains all cache lines in L2 caches and also some cache lines that have been evicted from L2 caches).

yangwu

@Hil. so does it mean each core needs to keep track of all memory? or are there 1 directory or 4 directories?

if there are 4 directories (1 core each), isn't it enough for each core to only keep records for memory that is hosted locally?

UPDATE: I think there is only one directory belong to the L3 cache, instead of 4 directories for each core. which explains my confusion.

kayvonf

@yangwu. Correct. The there is one directory entry for every line that appears in the L3 cache. By inclusion, this guarantees that the L3 contains a directory entry for any line in any of the L2 caches.

In this case (4 cores, 64-byte cache lines, M-byte L3 cache), a full bitvector directory scheme will only take up 4 * (M/512) bytes, ie. 1/128 the size of the L3 cache, or about a 0.8% overhead. So in general, for small numbers of cores, a directory-based approach (compared to a snooping-based approach) has the benefit of less traffic on the interconnect with just a little more memory overhead.