Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2017

Previous | Next --- Slide 26 of 49

blah329

The way that a limited pointer scheme works is that the directory entry is now composed of n pointers, where a pointer is just the ID of the node that has the line in its cache. If you have k processors, then you can represent an ID for that node with ceiling(log2(k)) bits, so your pointers are the same length. Thus, the amount of bits taken up by this new scheme is n * ceiling(log2(k)) per line. Using the numbers in this example, n = 10, and k = 1024 ==> one directory line takes up 100 bits, which is far less than the 1024 bits that it would take using the bit vector that indicates if processor i has the line in its cache by having bit i set. This is a large optimization in terms of space overhead when constructing directories. One interesting point about this is that n is a fixed number. This implies that at most n processors can have this line in their caches. What if n + 1 processors need to access this data? If n processors have the line in their caches already, a possible implementation of how to handle this would be to select a processor that will invalidate its entry for that line in the cache--effectively removing it from the list of processors that hold the line in their caches--and then adding the ID of the new processor that will now hold the line in its cache.