Previous | Next --- Slide 40 of 44
Back to Lecture Thumbnails
top

I'm not sure I understand why exactly that diagonal line indicates the region where we are bandwidth limited. Can someone explain?

ericwang

I tried to explain to myself by using the units. Not sure if it makes sense.

Notice the unit of X axis is Flops/Byte, the unit of Y axis is GFlops/s. So the unit of slope is GB/s. It should be the throughput of the processor under certain operational intensity and attainable, right?

Now, check the diagonal region, the throughput (slope) is fixed. It's the memory bandwidth limitation. Then in the horizontal region, the throughput is decreasing when we increase operational intensity. Because the computation requirement exceeds the capability of the processor, it cannot use full memory bandwidth now. So it is called compute limited area.

afa4

When we say that a machine is compute limited (horizontal region above) does it mean that it is able to hide memory latency completely at that point?

funkysenior15

@afa4 Yes, I think so. The AMD Opteron X2 has a peak floating-point performance of 17.6 GFlops/s and a peak memory bandwidth of 15 GB/s. You can see that at the point where the Opteron X2 reaches its peak floating point performance (at 17.6 GFlops/s), the operational intensity is a shade greater than 1, which means that the memory bandwidth being used is about 15 GB/s.

From looking at the graph at least, it seems that in the horizontal line, memory bandwidth is being completely used.

(Note) I found this very interesting: http://www.eecs.berkeley.edu/~waterman/papers/roofline.pdf

russt17

@top The diagonal segment of the lines correspond to the processor being bandwidth limited because the positive slope means that increasing the arithmetic intensity is still increasing performance.

It's important to remember that a process is always SOMETHING-limited. If a process is going faster by increasing arithmetic intensity, we know that it is NOT compute-limited, since it's able to do more computations per second. The assumption made in the conclusion here is that since its not compute limited, it is bandwidth limited. There could be other causes, for example it could be network limited waiting on a server response.