Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2017

Previous | Next --- Slide 35 of 60

viveksri

for a NUMA-style interconnect in a really large system like a supercomputer, is there any way to speed up the memory accesses between physically far regions, or is it just a matter of structuring the topology of the memory bus in a way that tries to minimize latency?

efang

As far as I can recall @viveksri, one of the ways they try is to place hierarchies of memory clustered around the processors to reduce latency, however mostly it seems to still boil down to using low latency materials for connections

vasua

I wonder what exactly the numbers are for latency to close and far regions of memory? The hardware at both ends should be the same regardless of the distance to the region of memory; the only difference should be the length of the wire. As more and more supercomputers / datacenters switch to fibre interconnects, data should be able to travel at light speed. That would lead me to believe that there really isn't that much difference in latency in accessing something that's over in the next rack, versus something over in the next room.

What order of latency difference are we looking at? 1x? 10x? 100x?

crow

https://en.wikipedia.org/wiki/InfiniBand#Performance

According to some of the numbers on this page on a common interconnect used in supercomputers, it's on the order of 1 microsecond, which is between 10^3 and 10^4 clock cycles.

It also seems that in the far future, the main contributed of latency will be lightspeed. A supercomputer which spans a 1km by 1km building will cause light to take 3 microseconds of latency. That is 10^4 clock cycles. Then for any fixed bits per m^3 storage solution, if you want a computer with n bits of storage, each bit will take in expectation cuberoot(n) to access due to the fundamental limit of light-speed.

Therefore, the commonly used ram model of computation which assumes constant time access may need to be revisited.

tkli

Although the speed of light may become a factor in memory accesses in the future, data locality and good parallelization may heavily mitigate the very long distances that information has to travel. One way to think about it is this: a stick of RAM can hold ~4 GB. If there are data dependencies are contained within around 100 GB of each other, the distance traveled is relatively minimal e.g. on the same rack. Of course, it is conceivable that certain important problems may have incredibly high dependency problems that require reconsideration of certain models.