Question: Why must programs have high arithmetic intensity to efficiently utilize modern throughput processors?
@kayvonf Programs with lower arithmetic intensity have to fetch data from memory more frequently, and processors that have completed their math operations will idle during this latency period. More idling -> lesser throughput.
@kayvonf Considering 2 execution contexts on a single core, the total running time of arithmetic operations is T_a, the total running time of memory access is T_m, and these two parts interleave execution. if T_a / T_m < 1 / 2, then the CPU will definitely be Idle at some time because two threads are both waiting for their data.
If we increase arithmetic intensity, and finally the rate T_a / T_m > 1 / 2, then we can utilize the CPU more efficiently.
In addition to what the other commenters wrote, even if the CPU is not idling in a low-arithmetic-intensity setting, it might not be taking advantage of ILP/hyperthreading.