Slide View : 15-418 Spring 2013

Previous | Next --- Slide 10 of 34

nkindberg

With a release consistency model, all re-orderings are allowed so the processor usually supports synchronization operations that are available to a programmer such as a fence operation to make some guarantees about what memory operations have been observed. A fence operation blocks until all memory accesses before it complete and prevents any additional memory accesses from occurring until the fence has completed. Once the fence completes you know that all writes that occurred prior to it are now visible to all processors.

One example programming model that has a fence operation is CUDA. CUDA has a __threadfence() operation that blocks the calling thread until all the shared and global memory accesses it made before the fence are visible to all other threads since writes to those locations are not guaranteed to be observed with the exception of atomic operations like atomicAdd. __syncthreads() will make sure all threads in a block have observed memory accesses by the threads in the block.

This comment was marked helpful 0 times.

Relaxed memory consistency is an example of "optimize for common cases".

In most cases, threads don't share variables or they just read the shared variables. So there is no need to maintain a strict ordering in such cases. And this model can achieve better performance due to reordering and hiding latency. If threads do need synchronization, the programs can use FENCE or Lock to guarantee ordering. Although it may increase the complexity of programs or even slow the system down in such cases, the overall performance can be improved.

This comment was marked helpful 2 times.

chaominy

@GG, as you mentioned this model is suitable for the workload with not many shared variables. How can this model be optimized for those "many shared variable" workload? Is a hybrid model possible?

This comment was marked helpful 0 times.

kayvonf

Another way to say what GG meant was that most of the time sequentially consistent semantics are not necessary since most of a program is not operating on shared variables. Therefore, it's arguably good design to allow some reordering for performance. The user always has the option of inserting fences when it matters for correctness. Of course, the burden is now on the programmer to realize when those situations exist.

This comment was marked helpful 0 times.

chaominy

@kayvonf thanks. I see... so in weak consistency model, "smart" programmers are responsible for managing FENSE to ensure the correctness and leverage the performance advantage of reordering at the same time.

This comment was marked helpful 0 times.

yongzuaw

Question: What kind of performance advantage can we get if we are able to reorder RR and WW? It's not clear that we can hide much latency here.

This comment was marked helpful 0 times.

pebbled

@yongzuaw We stand to hide the same amount of latency by relaxing the RR constraint as we do the RW constraint- that is, the latency of a read. In both cases, by relaxing the constraint, we allow the next operation to begin before one read has finished. The same goes for relaxing the WR and WW constraints. In both cases, we allow the following operation to start earlier by the same amount of time- the time it takes to perform a write.

This comment was marked helpful 0 times.