Question about Hyper-transport/QPI: When you increase the amount of processors, do you need more of these links, or can the Hyper-transport/QPI function one-to-many?
During the lecture, I had a question about Memory to Memory Controller mapping. I asked whether having multiple memory controllers could theoretically increase performance by accessing two different areas of memory in parallel.
From what I looked up and understood, it's not the memory controller that creates a bottleneck, but rather the memory bus that connects the controller to the memory that creates it. There is a mechanism for having separate channels to connect one memory controller to multiple memory banks called Dual-Channel Architecture where one memory controller is connected to multiple memory banks through multiple buses. This, in turn, allows the memory controller to access two different memory banks in parallel.
I happen to have experience optimizing memory visit on a NUMA machine. It turns out that the first-touch policy in Linux is pretty useful to deal with NUMA problem.
Imagine a large array that need to be processed in parallel by two CPUs with their own memory. In Linux, memory is not determined to point to a specific location on hardware when you called malloc or new, but when you read/write to the memory the first time. Therefore if you keep the array uninitialized, and initialize the corresponding part of the array from different threads, then you automatically avoid NUMA latency, because a CPU will always prefer allocating memory on a closer RAM.