Professor Kayvon mentioned that warps are similar to execution contexts and that 32 of them share instructions (on x86) similar to when we talked about vector instructions and SIMD. He also noted that each block executes like threads in SIMD.
I'm still a little confused by the idea that on a GPU, blocks are "spawned" just as we spawned threads on a CPU. What then is a task and the individual threads on each block analogous to?
@makingthingsfast. Does my comment on slide 60 help? A CUDA thread (like any logical thread), needs a hardware execution context to run it. On a GPU, 32 CUDA threads share an instruction stream, and we refer to the 32 execution contexts that share an instruction stream as a warp.
It's perfectly reasonable to think of things in two ways:
The former explanation is a little bit closer to the reality of a GPU microarchitecture's implementation. The latter establishes a more obvious--and often very helpful--correspondence between the concept of an NVIDIA GPU warp and a traditional CPU thread executing explicit SIMD instructions (e.g., AVX vector instructions).
I usually prefer students to think of NVIDIA GPU cores as being 64-way hardware multi-threaded (recalling the number of warps per SMM), rather than 64x32-way multi-threaded (the number of CUDA threads per SMM), since a warp has a more obvious correspondence to a traditional CPU thread.