Previous | Next --- Slide 24 of 36
Back to Lecture Thumbnails
lixf

Is it appropriate to think of the "interconnect" as a queue? What happens if the queue is full?

spilledmilk

It appears that the interconnect, at least in Intel's case, is two 20-lane point-to-point data links, 20-lanes in each directions. My guess is that if all the lanes in one direction are occupied and data transfer is attempted in the same direction, the request will stall (maybe spin-lock?) until a lane is open and then send the data over.

Reference: http://en.wikipedia.org/wiki/Intel_QuickPath_Interconnect.

kayvonf

Question: The first two bullet points on this slide describe two major principles of implementing cache coherence. Someone should try to put my description in their own words.

smklein

@kayvonf: I will try explaining these principles in my own words --

Imagine computer where processors could only read. Cache coherence is easy! No matter what, all processor caches are consistent with each other. If caches are shared between processors, it produces no problems.

How can we introduce writing? In the simplest way, we must be sure that no one else has outdated information.

We can make this our first step: If we want to write to X, we must trash the caches lines of everyone else who had a copy of X (aka, broadcast a read-exclusive transaction).

Okay, cool, we can start writing. But when are we required to stop? Essentially, we can keep writing until someone else wants to write to X's cache line (in which case, they trash our cache), or someone else wants to read from X's cache line (in which case, they ask to share the cache).

Until this invalidation/sharing happens, no one else has read/wrote to our cache line. We signify our state as "modified" (if we actually wrote the cache, and we need to tell main memory), or "exclusive" (if we match main memory). This means we can keep writing without bothering to tell anyone else, as no one else needs to know what values are actually in X's cache line.

kayvonf

Excellent @smklein. Now can someone try to do the same, but use the terms write propagation and write serialization in their explanation.

idl

@kayvonf I'll take a stab at this. The purpose of these coherence protocols is to ensure write propagation. This is where a processor P's write to a location in memory is propagated, or seen by all the other processors. This gets at the heart of the coherence problem; without write propagation, other processors might get stale data upon reads. This can be ensured by the invalidation protocols as summarized by @smklein above, via read-exclusive broadcasts, and the various states in the state diagrams on slide 20.

Write serialization is to follow the definition of coherence, where we strive to define a total ordering of operations at the same memory location on a line. This ensures that every processor observes the same order of reads/writes. Having the read-exclusive state ensures that only 1 processor may write to a memory location at any one time, and the result of write propagation is that any read that follows that write would get the updated value of that location (otherwise the read 'dot' would appear before the write on the timeline).

Q_Q

@lixf, @spilledmilk The 20 lanes refers to that 20 bits can be transferred on one clock edge.

What probably actually happens is that since the interconnect uses QPI to create a ring buffer, and the ring buffer's slots for messages are defined by a number of clock edges (so if a slot is 5 edges, it can hold 20 lanes x 5 edges = 100 bits), and that the cache controller has to wait until it sees an empty slot to put it's message in the buffer.