The key assumption here is that in practice, many fewer than $P$ processors will be referencing a cache line at any given time. By instead having a fixed number $n$ of pointers per cache line that each reference a processor that is referencing that cache line (if there is one), we instead have the memory overhead of $n \cdot \log_{2}(P)$ bits to keep track of a cache line, rather than the full $P$ bits (which quickly becomes untenable as $P$ gets larger).

sasthana

@MaxFlowMinCut - I think I have understood this slide the way you have articulated it here. Taking an example with reference to the slide here, instead of keeping it general, I hope this is what the slide means.

If we decide we need only 30 bits per cache line instead of 1024 then we can allow sharing among at most 3 processors. This is because we need 10 bits to label a single processor if there are 1024 processors.

If it so happens that a fourth processor wants to share the line as well then we can use one of the ways listed on the next slide. Like fallback to broadcast or eviction.

a

Would we ever want to use the numbers described in this slide? In this example, would we ever want to store 100 10-bit pointers over a full bit vector scheme?

yikesaiting

I think this graph clearly implies one important concept of first half of semester: "Simple strategy for most cases and tolerable complicated scheme for corner cases." And I think that is what Prof. Keyvon wants to tell us

The key assumption here is that in practice, many fewer than $P$ processors will be referencing a cache line at any given time. By instead having a fixed number $n$ of pointers per cache line that each reference a processor that is referencing that cache line (if there is one), we instead have the memory overhead of $n \cdot \log_{2}(P)$ bits to keep track of a cache line, rather than the full $P$ bits (which quickly becomes untenable as $P$ gets larger).

@MaxFlowMinCut - I think I have understood this slide the way you have articulated it here. Taking an example with reference to the slide here, instead of keeping it general, I hope this is what the slide means.

If we decide we need only 30 bits per cache line instead of 1024 then we can allow sharing among at most 3 processors. This is because we need 10 bits to label a single processor if there are 1024 processors.

If it so happens that a fourth processor wants to share the line as well then we can use one of the ways listed on the next slide. Like fallback to broadcast or eviction.

Would we ever want to use the numbers described in this slide? In this example, would we ever want to store 100 10-bit pointers over a full bit vector scheme?

I think this graph clearly implies one important concept of first half of semester: "Simple strategy for most cases and tolerable complicated scheme for corner cases." And I think that is what Prof. Keyvon wants to tell us