Previous | Next --- Slide 40 of 43
Back to Lecture Thumbnails
BestBunny

Always aim to be on the computation bound side of this graph. When processing is bandwidth bound, the instruction pipeline remains idle and when processing is computation bound, the ALUs are saturated and the system has reached the level of maximum possible throughput possible. To figure out which side of the Roofline model you're on, try increasing number of mathematical operations without memory accesses or decreasing memory accesses without mathematical operations (one at a time without changing the other). Which of these changes affect the overall execution time will be determined by where the process lies on this graph.

dmerigou

The ideal spot for a program is to be on the point where the curves breaks from diagonal to horizontal, since it maximizes both the used bandwidth and processing power.

kapalani

Explanation of Graph:

The x axis of the graph represents the level of arithmetic intensity of the program and the y axis represents throughput. Each processor has a fixed maximum throughput. In the sloped region, the processor spends most of the time waiting for the memory operation to be satisfied (since the arithmetic intensity is low, it doesn't perform enough instructions to hide the memory latency) and so it doesn't achieve peak throughput because memory can only supply data at a certain rate. So the program is bandwidth bound. To improve performance, we either need to reduce the number of memory accesses per arithmetic operation or buy faster memory that has lower latency

In the flat region, the processor has high arithmetic intensity and is no longer limited by memory i.e. it performs enough instructions for every memory access to hide the memory latency. So in this case, the memory is no longer the bottleneck and is able to keep up with the rate the processor requests memory at. Hence the program is compute bound and if we want to improve the performance, we have to reduce the amount of work performed or buy a machine with more processing power