Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2017

Previous | Next --- Slide 36 of 81

planteurJMTLG

Fun fact about the name "CUDA", according to Wikipedia: "CUDA was an acronym for Compute Unified Device Architecture, but Nvidia subsequently dropped the use of the acronym", so CUDA doesn't mean anything anymore.

pdp

why is the design goal "maintain low abstraction distance"? To provide more flexibility to the programmer?

kayvonf

@pdp. I like this question a lot.

"Low abstraction distance" means that the abstractions presented to the programmer closely mimic the model of how the system will actually implement/run the program on a machine. This is often desirable when its expected that programmers will care heavily about performance, and thus want to maintain visibility of how it is run.

For example, if a programmer writes a program against high-level abstractions that provide a compiler to perform complex, global changes to program structure, then it can be difficult for a programmer to performance tune their code --- since the code they wrote looks nothing like the program that actually ran on the machine. In contrast, a programming system catered for novice programmers might choose a high level of abstraction to allow the compiler/runtime system to make all the performance-related decisions since the programmers might not have the knowledge to make good ones. However, the compiler's decisions might not be as good as those made by an expert programmer. In the latter case, it might be advisable to provide experts with lower-level abstractions that allow the program to map an algorithm to a machine in the manner that they know is best.

Here's an example: How would you compare CUDA or ISPC's level of abstraction (launch a number of SPMD CUDA threads that can essentially do whatever they want), with the more highly structured stream programming model where the compiler is afforded additional information about the program that enables it to attempt more global code optimizations, such as this. If I showed you the stream program on that slide, and asked you about it's arithmetic intensity, you might inspect the individual foo and bar kernels. However, that would not be the arithmetic intensity of the program since the compiler doesn't actually execute foo on all stream elements, followed by bar on all stream elements. It actually runs the kernels back-to-back. I'd say that in this case, the difference between how I think about the program and how it actually runs on the machine is much higher than in the CUDA or ISPC case.

If you were a programmer that didn't want to think about performance (and wanted to compiler to handle it for you), you'd likely prefer the simplicity of the pure stream programming model. If you had ideas about how to schedule the algorithm on a machine to obtain the best performance, you'd like to be able to express the program in a manner more like how it actually runs.

If you are using a CUDA, you probably care about performance. So NVIDIA made the decision to make CUDA a lower-level programming model that closely resembled how GPUs run code. Higher level abstractions or new languages could be layered on top of CUDA to provide an interface more suitable for novices. And you see this with systems like NumPy, which are designed for use by programmers that prioritize convenience.

ZoSo

This being said, does it mean that the CUDA compiler is not as robust? If say, a programmer does not wish or know how to optimize on their own, but relies on the compiler, is it still a good choice to use CUDA?