Previous | Next --- Slide 12 of 69
Back to Lecture Thumbnails

Why does efficiency matter? If you're bottlenecked by your memory bandwidth, why do you care if your processor is running 3% efficiently or 40% efficiently?


@aoeuidhtns Think back to the discussion we had on this slide.


I don't think he's saying that it's a terrible thing that we only have 3% efficiency. For this operation, 3% efficiency is about as good as it gets. My understanding is that this example just seeks to illustrate that speed and efficiency are two different things.


@marjorie: exactly!


@aoeuidhtns: And also take a look at my comments on the same slide in lecture 2. The system is inefficient in that NVIDIA could have built a chip that was much smaller (and saved a lot of money in costs), and it still would have had the same performance on this application.

If this was the computation that you cared about most, this would be a terrible chip design. Luckily, the primary workload for a GPU design is 3D graphics, and 3D graphics tends to have very high arithmetic intensity.


You mentioned that GPU workload typically has very high arithmetic intensity, so this low bandwidth is okay. I'd imagine that CPUs tend to use memory at least somewhat more. Because of this, are they more typically designed for higher bandwidth? Are there limits we're running into in terms of bandwidth -- in other words, does it look like we'll be able to keep increasing bandwidth significantly for a while still?


I would say that CPU are designed to be as efficient as possible with the memory bandwidth they are alloted, taking in to consideration that they will need to support more memory operations.

This is highlighted by the many layers of caches in CPUs. It's more of a optimize for bandwidth rather than expect bandwidth philosophy.

As to your second question, max possible bandwidth is really only limited by our materials science research, so at this time it is just of matter of buying more bandwidth


CPU design is governed by the fact that many popular everyday applications cache well (databases, business applications, MS Word, etc). Since caches are very effective at servicing most memory requests, large amounts of off-chip bandwidth to memory are not necessary to provide data to compute units.

However graphics, multi-media, data-mining, and scientific computations -- all applications that very much benefit from throughput oriented parallel architectures -- don't exhibit as good of cache behavior. Since a cache hierarchy is less effective, you see designs that lean more heavily on high bandwidth memory systems and multi-threading to hide memory latencies.


This example showed me how very differently a well parallelized algorithm and its implementation run.