Previous | Next --- Slide 56 of 62
Back to Lecture Thumbnails
tianyih

So CUDA is an abstraction, while WARP is the implementation. To my knowledge, a good abstraction should hide all the details below, and just provide an interface for its users. But in this case, if I make the block size 256 and use synchronization in each block, it will produce dependency and lead to deadlock, right? So if I am writing CUDA codes, I have to know about the hardware details, otherwise I might come up with some unexpected results. In this sense, is CUDA a bad abstraction?

kayvonf

@tianyih. This slide concerns details of implementation. It says that IF a CUDA implementation was not able to run all threads in a thread block concurrently, then certain valid CUDA programs could in fact deadlock. Clearly this is not good, and so a correct CUDA implementation must be able to run all threads in a thread block concurrently.

As a CUDA programmer you do not need to worry about the implementation details of GPU scheduling to ensure your program is correct (unless you write CUDA code in a very specific way that violates the spirit of the independent blocks abstraction, as discussed here). The program shown on this slide is a valid CUDA program, and any chip that claims to be a CUDA implementation will run it correctly. Otherwise, you'd throw it out and buy a new one!

Summary: this slide is only pointing how CUDA's abstractions constrain what implementations are possible. The proposed implementation is not a correct one!