Previous | Next --- Slide 80 of 81
Back to Lecture Thumbnails
kayvonf

Question: Here's a subtle question. If you can provide a good answer to this question you've got a great handle on how to think about CUDA's abstractions. Would you say CUDA encourages the programmer to think in manner that is similar to data-parallel/stream programming models, or more like a shaded address space threaded programming model. Or both?

paracon

CUDA encourages the programmer to think along the lines of a data-parallel/stream programming model by asking for a kernel function that can be independently applied on streams(sequence of elements that can be processed independently). The kernel function is the executed in SMs of the GPU.

Though CUDA does provide shared memory regions(between blocks of a thread, and threads of a block), the basic programming model is not synonymous with shared address space threading model.

nemo

@paracon I agree with the first point, but I think that it does have some aspects of a shared address space threaded programming model. Being able to use _syncthreads() as a barrier for all threads in the block is an example of that. So, I would say both.

pdp

I would agree with nemo. Having a shared address space for all threads in a thread block or having a global device memory along with the notion of all threads in a thread block running concurrently with cooperation makes a programmer think like a shaded address space threaded programming model. Having a warp executing 32 threads in SIMD fashion and all threads in a thread block running concurrently on a single core makes a programmer think like a data-parallel/stream programming model. So it has both elements in its abstraction.

albusshin

Seconded. The thinking process of separating the whole load of work to different threads is data-parallel, and the way of using shared variables corresponds to the shared address space threaded programming model.

fxffx

I think CUDA encourages the programmer to think in manner that is similar to data-parallel/stream programming models, because the threads are grouped by warps and are executed together, so it's more natural to let all these threads do similar instructions on different parts of data.