Previous | Next --- Slide 31 of 35
Back to Lecture Thumbnails
askumar

If GPUs do not implement cache coherence, does this mean it is possible to have inconsistent data in the cache and memory of the GPU? Or do they just avoid coherence issues by only using shared caches?

kayvonf

Yes, it is certainly possible. For example each SM core in an NVIDIA GPU has it's own L1 cache. There is a single shared L2 for the chip. My understanding is that the L1's use a write through policy.

The L1's are not coherent and it is certainly possible that a write by one of the cores is not observed by a subsequent read from another core. I believe however, that all memory operations are guaranteed to be visible by the next kernel invocation.

An interesting question is to consider what CUDA programs would benefit from full-blown support for coherence across the L1's. Note this would entail inter-thread-block communication, and CUDA does provide a set of atomic operations for this purpose. The atomic operations are explicit, and are treated specially not only to ensure atomicity of these operations, but also so that the other processors observe the updates.