If my understanding is correct, removing all math but loading the same data would test whether you are limited by latency, whereas changing all array accesses to A[0] would show whether you are limited by bandwidth?
anamdev
It might not just be if you're limited by bandwidth. If you are choosing ways to access data that involve poor locality, you can have a lot of accesses to memory due to cache misses.
ferozenaina
Adding to that, setting array accesses to A[0] would give you an idea about redesigning the data structures to improve data locality to decrease cache misses.
I feel that last point - removing atomic operations and locks and redesigning program to test and reduce sync overhead would be the hardest - the data structures would have to be changed to work without locks.
There is research that does this.
If my understanding is correct, removing all math but loading the same data would test whether you are limited by latency, whereas changing all array accesses to A[0] would show whether you are limited by bandwidth?
It might not just be if you're limited by bandwidth. If you are choosing ways to access data that involve poor locality, you can have a lot of accesses to memory due to cache misses.
Adding to that, setting array accesses to A[0] would give you an idea about redesigning the data structures to improve data locality to decrease cache misses.
I feel that last point - removing atomic operations and locks and redesigning program to test and reduce sync overhead would be the hardest - the data structures would have to be changed to work without locks.