Previous | Next --- Slide 36 of 79
Back to Lecture Thumbnails
kayvonf

Question: If you were to contrast a CUDA thread with a traditional pthread, how would you make the comparison? (Hint, you probably are going to need to talk about implementation details to answer this one clearly.)

bojianh

I am only going to answer the pthread portion here: Someone can answer the CUDA thread part

On Kernel End

  1. Pthread requires new thread control block

  2. Requires new copy of kernel stack

  3. Add the thread to scheduler

On User End

  1. Get a new user stack (malloc or new_pages) and set it up

  2. Make a user thread control block

  3. Call Kernel to get a new kernel thread

  4. Switch the stack if child thread

MaxFlowMinCut

As opposed to a pthread, a CUDA thread can be considered more as a program instance in an ISPC gang than an actual hardware thread. On a GPU core, 32 CUDA threads are stored on a warp execution context. When the core decides that warp should be executed, its 32 CUDA cores are executed on a 32 wide SIMD ALU. This is in contrast a pthread, which has its own execution context and can be logically considered as its own hardware execution thread.

Another way of thinking about it is as such: 1. A pthread can be considered a "hardware thread", with its own execution context, its own single instruction stream, and its own logical execution. A pthread may execute SIMD instructions over a set of inputs on an ALU. 2. A CUDA thread can be considered more as a lane in a SIMD vector. Its execution context belongs logically in that of a warp, which encapsulates 32 CUDA threads for execution in 32 wide SIMD ALUs.

If anything, a warp can be considered somewhat similar to a wider version of a pthread, though there's no concept of a block in pthreads so the analogy doesn't completely make sense.