Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2015

cacay

A related idea, and somewhat of an alternative to coherence, is relativistic programming.

grose

For a related idea, CUDA employs a form of weak coherence, where independent read instructions on a thread may occur in any order, and similarly, the order of which a thread writes data, is not necessarily the order observed by another thread, unless you use __syncThreads()

fgomezfr

@grose I think what you're saying is that CUDA does not employ coherence :)

My experience with GPU's is that they use asynchronous memory accesses, which means two reads may not return in the same order that they were initiated. For a given thread (really, a warp) you maintain a 'weak' ordering by issuing async wait instructions to make sure that reads from an address return before any write to that address is initiated, and only allow one write to a particular address in flight at a time. Within such an ordering, the exact return order of overlapping reads doesn't matter for correctness. As you've noted, the order of writes to separate addresses is also not guaranteed to be the same as the order they were initiated, but they will be visible to all threads (within a block) in the same order, because all threads in the block are reading out of the same shared memory. In other words, if thread A writes B then C, there may be a point in time where the value of C has changed but B hasn't. However all threads in the block will see this result; no thread will see B change before C. It's 'coherent' in the sense that all threads in a block see the same order, but it doesn't follow the rules on this slide that define the order. As a result, you have to use __syncthreads() or a memory barrier to enforce a desired order.

kayvonf

@grose and @fgomezfr: You are both talking about memory consistency and not memory coherence. Coherence is concerned with the order of reads and writes to A (the same address). Consistency concerns the order of reads/writes to A and reads/writes to B (different addresses).

Please see slide 15, and then we'll pick this discussion back up next week.

HingOn

Does "hypothetical serial order of all program operations" mean the sequence of LD/ST instructions on a common memory address has to be fixed and unique? Or does it mean that there can be multiple sequences and the memory system is coherent as long as the ultimate result value is the same?

ChandlerBing

@HingOn, I would think it means the latter. For instance, on the timeline shown on this slide, exchanging P1 read(5) and P2 read(5) would give us a different order which still satisfies both the conditions and the final result is also the same.

grose

Yes, any serial sequence that has the result will do. Of course, there will be serial sequences with different results.

ananyak

I understand the intuition behind cache coherence, but I don't understand the formalization. Based on the definition on this slide, what does coherence give us? Take slide 5, for example, which illustrates the coherence problem. A coherent memory system, as defined on this slide, doesn't seem to solve the problem. Below is a hypothetical serial order that produces the same result as the incoherent memory system on slide 5.

p1 load x (0)

p2 load x (0)

p3 load x (0)

p3 store x (2)

p1 store x (1)

Final value of x = 1

Is there any concrete example where a memory coherent system cannot produce some output that a memory incoherent system can. If not then what, precisely speaking, does coherence give us? It seems to me like coherence primarily has value when combined with locks/synchronizations, so locks/syncs would need to be added to the formalization for it to have meaning.

ananyak

I realize that one problem this definition solves is coherence across different memory addresses the are on the same cache line. For example, suppose that memory address 20 and 24 are on the same cache line and initialized to 0, and consider the following sequence of instructions.

p1 writes 5 to address 20

p2 writes 5 to address 24

(both processors flush their caches)

Without coherence, either address 20, or address 24, will not be updated (depending on which cache line gets flushed to memory first). This is related to the false sharing problem. But does coherence solve anything besides this issue?

kayvonf

Another example: what could happen without coherence here? Assume each core has its own cache and the two caches are not coherent.

int* x;  // assume no register allocation.
         // All accesses are loads and stores

Thread 1:

 x = 1;

Thread 2:

 while(x == 0);
 print "I see it!";

jazzbass

@kayvonf Thread 2 never observes the write to x from Thread 1, therefore "I see it!" is never printed.

Thread 1 writes to x but changes are reflected only in the cache of the core where thread 1 is running (assuming writeback cache). Thread 2 initially reads the value of x from memory but will see x = 0, even if Thread 1 already did the write. After this, thread 2 will keep reading the value of x from its cache, and will never see the updated value.