"The sixteen cores on each blade share 128 Gbytes of local memory. Thus, each core has 8 Gbytes of memory and the total capacity of the machine is 32 Tbytes. This 32 Tbytes is divided into two partitions of 16 Tbyes of hardware-enabled shared coherent memory. Thus, users can run shared memory jobs that ask for as much as 16 Tbytes of memory. Hybrid jobs using MPI and threads and UPC jobs that need the full 32 Tbytes of memory can be accomodated on request."
More info from the PSC site:
"The sixteen cores on each blade share 128 Gbytes of local memory. Thus, each core has 8 Gbytes of memory and the total capacity of the machine is 32 Tbytes. This 32 Tbytes is divided into two partitions of 16 Tbyes of hardware-enabled shared coherent memory. Thus, users can run shared memory jobs that ask for as much as 16 Tbytes of memory. Hybrid jobs using MPI and threads and UPC jobs that need the full 32 Tbytes of memory can be accomodated on request."
http://www.psc.edu/index.php/computing-resources/blacklight#hardware
This comment was marked helpful 0 times.
Some interesting stats about the NUMA performance of Blacklight:
Latency to DRAM on the same blade as a processor: ~200 clocks
Latency to DRAM located on another blade: ~1500 clocks
Source: slide 6 of http://staff.psc.edu/blood/XSEDE/Blacklight_PSC.pptx
This comment was marked helpful 0 times.