Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

Previous | Next --- Slide 33 of 57

kipper

With the NUMA approach, accessing local memory is faster than accessing remote memory. Local memory accesses reduces the average access time and also reduces bandwidth demand placed on the network. (from one of the recommended texts) With this model, though, I wonder how expensive it is to keep each processor's cache up to date.

bullseye

With some NUMA systems, there is additional hardware/software to transfer data between different processors' memory banks. This is useful when multiple processors are accessing the same data, but slows the corresponding processors when moving this shared data. Also, this presents the problem of data modification. There are two primary types of NUMA systems: cache coherent and non-cache coherent. Cache coherent NUMA systems attempt to keep modified data consistent across caches, while non-cache coherent does not. However, maintaining cache coherence incurs a large amount of overhead, and is particularly difficult when multiple processors attempt accesses to the same memory location at once.

CaptainBlueBear

@bullseye don't know if you know the answer but does that mean that ensuring cache coherency has more overhead than a cache miss? Because for non-cache coherent systems, isn't accessing the cache and finding out the data is out of date equivalent to a cache miss?

PID_1

@CaptainBlueBear I think it depends on the relative cost of main memory access and communication between caches. If the cost of communicating among caches is smaller than the cost of going all the way to main memory (a reasonable situation, since they are closer to each other than they are to it), the the coherency protocol may not be as expensive as a miss would.