Previous | Next --- Slide 62 of 65
Back to Lecture Thumbnails
GG

I think using a big cache can help reduce the number of accesses to the memory. So if bandwidth limit is a problem in both GPUs and CPUs, why use a relatively small cache in GPUs?

mburman

Look at danielk2's comment on slide 59.

Bandwidth isn't generally a problem for GPUs. An NVDIA GTX 480 GPU has a maximum bandwidth of 177 GB/sec. Compare this to an Intel i7 processor which has a maximum memory bandwidth of about 25 GB/sec. That's a huge difference and is one reason GPUs do not need large caches.

I think another is that GPUs are designed to have a very high compute capability. As shown in this slide, they are typically required to do a lot more arithmetic operations per unit data accessed than a CPU. As a result, by design, it makes sense to fill a GPU core with arithmetic units instead of extra cache memory.

kayvonf

Careful on this one: Remember it is about ratios, not about the absolute values.

A high-end GPU typically is connected to a memory system that can deliver significantly higher bandwidth than the memory system attached to a high-end CPU, but the GPU also has also has significantly more compute capability. The combination of this high compute capability coupled with smaller caches on a GPU often does mean that bandwidth limits come into play in GPU programming. We certainly saw it on the last slide.

mburman

Ok, I agree that it's about ratios, I certainly did have GPU bandwidth issues on my final project last year.

Kayvon, I might be wrong here, but I think a more viable answer to @GG's comment is that it's not about bandwidth. It's about latency. GPUs have 1GB of on board DRAM. Caches are used to lower memory latency but on GPUs, with the DRAM on the chip, latency is low as is and the need for a large cache is substantially reduced.

kayvonf

Actually, the opposite is true. The GPU memory is off-chip DRAM just like a CPU's main system memory. The GPU accesses it's own high-performance DRAM located on the graphics card. Since the entire GPU system is designed for throughput, other components, such as the memory controller tend to prioritize latency over throughput, so memory access times on a GPU tend to be higher than on CPUs. It's a bit of a feedback cycle: if you have a system that's all about throughput (and hiding latency), you start making design decisions for each component that don't prioritize latency... and as a result you have very high latencies... which means you need more latency hiding...

PINGAS

I think the point that "on-board != on-chip" is worth making more clear. Correct me if I'm wrong, but GPU DRAM is faster only because it's a slightly fancier type of DRAM; it's nothing like on-chip CPU caches.

Also, it seems that large caches would be less useful on a GPU than a CPU, based on my guess that the typical GPU workload accesses the bulk of its data only once? (I guess not necessarily just once I guess; you might want to do a second or third pass over all the data, but large caches still wouldn't help there.)