Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

Previous | Next --- Slide 36 of 79

kayvonf

Question: If you were to contrast a CUDA thread with a traditional pthread, how would you make the comparison? (Hint, you probably are going to need to talk about implementation details to answer this one clearly.)

bojianh

I am only going to answer the pthread portion here: Someone can answer the CUDA thread part

On Kernel End

Pthread requires new thread control block
Requires new copy of kernel stack
Add the thread to scheduler

On User End

Get a new user stack (malloc or new_pages) and set it up
Make a user thread control block
Call Kernel to get a new kernel thread
Switch the stack if child thread

MaxFlowMinCut

As opposed to a pthread, a CUDA thread can be considered more as a program instance in an ISPC gang than an actual hardware thread. On a GPU core, 32 CUDA threads are stored on a warp execution context. When the core decides that warp should be executed, its 32 CUDA cores are executed on a 32 wide SIMD ALU. This is in contrast a pthread, which has its own execution context and can be logically considered as its own hardware execution thread.

Another way of thinking about it is as such: 1. A pthread can be considered a "hardware thread", with its own execution context, its own single instruction stream, and its own logical execution. A pthread may execute SIMD instructions over a set of inputs on an ALU. 2. A CUDA thread can be considered more as a lane in a SIMD vector. Its execution context belongs logically in that of a warp, which encapsulates 32 CUDA threads for execution in 32 wide SIMD ALUs.

If anything, a warp can be considered somewhat similar to a wider version of a pthread, though there's no concept of a block in pthreads so the analogy doesn't completely make sense.