Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2015

Previous | Next --- Slide 4 of 37

Berry

Interesting article here relating to the way modern NVIDIA GPU's can run into big trouble when the interconnect is missing a few lanes: http://www.pcper.com/reviews/Graphics-Cards/NVIDIA-Discloses-Full-Memory-Structure-and-Limitations-GTX-970

In short it is now easier to disable SM's and their individual caches than to print a new die (a few years ago AMD tried to simply use a different firmware which people just reflashed and saved 200$ on a higher tier card), which is what NVIDIA did to the GTX 980 to come out with the GTX 970. The problem was that they had to disable one L2 cache and several SM's to get the die to suck hard enough not to cannibalize the GTX 980. A result of this would be that the 7th crossbar port would get twice as many requests to communicate with it's two sets of DRAM banks and use only one L2 cache as it did so.

To avert synchronizing everything to the slow 7th crossbar port NVIDIA moved the 0.5GB that the second DRAM bank of the 7 crossbar was responsible for into a separate memory pool and used clever scheduling to mask away the latency (at least that's what I got from the article). Yet any time the dye starts addressing more than 3.5GB the frame rate takes a nosedive.

The question I still have is why they didn't just remove more SM's to bring down the FLOPS.

andymochi

@Berry This video popped up on reddit the week of our cache coherence lectures - it was also about the Nvidia 970. Even though it's fake, it's still a pretty funny commentary on how the whole thing was handled.