Slide View : 15-418 Spring 2013

Previous | Next --- Slide 21 of 54

stephyeung

Question: Is CUDA an example of a shared address space model or message passing model? Variables can have the __shared__ attribute, but we also said cudaMemcpy is like message passing.

This comment was marked helpful 0 times.

Amanda

Well, CUDA definitely uses a shared address space for threads. All threads within a block can communicate by reading from and writing to variables in shared memory. We even have global memory, which is a shared memory space for a given device to which threads from multiple blocks on the device can read and write.

We do have the cudaMemcpy operation between the host and device, and while it's the only way of passing information between these two devices, I'm not sure that it ascribes entirely to the 'message passing' model we saw in class. It seemed like our models were specific to describing a means of communicating between concurrent threads, which is not what is happening between the host and device. At least, not in the examples we've seen so far.

This comment was marked helpful 2 times.

martin

To the first question, CUDA should be considered as data-parallel programming model because it supports numberBlocks and threadsperblock (up to 3D).

To the second question, CUDA is also an example of shared address space model. As mentioned above, there exists global shared memory per chip, and also shared memory per block and memory space only accessible for each thread.

To the third question, cudaMemcpy is an example of message passing as its the primary mechanism of passing data between the host and device address spaces.

Compared to ISPC instances, threadID is similar to programIndex. blockDim is similar to programCount.

This comment was marked helpful 0 times.

kayvonf

@martin: While CUDA's blockDim can be compared to ISPC's programCount in that both define the number of logical threads of control in a grouping, there's one small difference when considering implementation. All the threads in a CUDA thread block do not necessarily execute in SIMD lockstep, but all the instances in an ISPC gang do. Therefore, when considering implementation details, ISPC's programCount is actually most similar to CUDA's warpSize. Both values give the number of program instances (or CUDA threads) that execute in SIMD lockstep.

This comment was marked helpful 0 times.