These results are collected on a shared memory system. Would switching the hardware to be distributed have any effect on the results?
nba16235
Could someone explain why the 4D blocked data layout can reduce the waiting time due to barriers?
EggyLv999
@nba The waiting time due to barriers is due to some threads taking longer than others. By reducing the mean time to get memory from the cache, we've also reduced its variance, which means that the slowest thread won't take as long on average to complete.
These results are collected on a shared memory system. Would switching the hardware to be distributed have any effect on the results?
Could someone explain why the 4D blocked data layout can reduce the waiting time due to barriers?
@nba The waiting time due to barriers is due to some threads taking longer than others. By reducing the mean time to get memory from the cache, we've also reduced its variance, which means that the slowest thread won't take as long on average to complete.