In this slide, the directory is approximately M/2 bytes.
There are M/64 cache lines of memory, which means there are also M/64 directory entries. Each directory entry needs 257 bits (~32 bytes), 1 dirty bit and 256 presence bits. So (M/64 entries) * (32 bytes per entry) = M/2.
I wonder, why do we keep the directory in the private cache of the CPU? From what it looks like to me, the core keeps the cache in it's own private cache rather than the L3 cache. Why not just keep the directory in the L3 cache and then when a query happens, another code can find out which processor holds that line's access and request a read from there, rather than first querying a different core to tell that core to look it up in its own directory?
@rmanne It seems that this example system divides its main memory into several chunks, each distributed to a node with a directory and a processor (w/ cache). There is no L3 or L2 cache, just P nodes with an interconnect. The later slides on Intel's cache systems do instead keep the directory in the L3 cache instead of these distributed directories.
@Josephus aha thanks, yeah they're on slide 43.