Slide View : Parallel Computer Architecture and Programming : 15-418/618 Fall 2016

Previous | Next --- Slide 45 of 78

bob_sacamano

blockIdx, blockDim and threadIdx are all of the type dim3. This implies that all 3 could have coordinates in 3 dimensions. However, I am unable to understand why all these 3 variables would need 3 dimensions, because hierarchies involving grids, blocks and threads alone can give us a notion of 3 dimensions?

bpr

With both block and thread IDs having dim3, there are actually 6 dimensions to identify each executing CUDA thread.

BlockDim is a "constant" value that tells that kernel the size of the block. BlockIdx identifies which block in the grid is executing. ThreadIdx identifies which thread in that block is executing.

Holladay

In the host code, we seem to pass the arrays devInput and devOutput and arguments to our CUDA call. What is the difference between doing this and using cudaMemcpy? And when would you use one and not the other?

taoy1

@Holladay I believe that the devInput and devOutput are float* type. And the cudaMemcpy is in the "// properly initialize contents of devInput here ...". You can check how to use cudaMalloc and cudaMemcpy in this slide: slide 42

The complete host code should be:

int N = 1024 * 1024;
float *devInput, *devOutput;
cudaMalloc(&devInput;, sizeof(float)*(N+2));
cudaMalloc(&devOutput;, sizeof(float)*N;

// properly initialize contents of devInput here ...
cudaMemcpy(devInput, Input, N+2, cudaMemcpyHostToDevice);

convolve<<<N/THREADS_PER_BLK, THERADS_PER_BLK>>>(N, devInput, devOutput);

Iamme

How does this take advantage of the 2 dimensional structure of the blocks. It seems that assigning 'index' based only on the x values would cause every thread in a column to do the same exact work, operating on the same index in the array.

ferozenaina

The 2D/3D feature of the block is not fixed. During execution, it does not make a difference between different dimensions.

Note: The maximum threads in a block is 1024 for Compute 5.3. So, this can be 1024 x 1 x 1 or 32 x 32 x 1.

The 3D is provided more for programmer convenience - images are typically 2D and are easier to index that way. Pointcloud data have x,y and z and 3D dimensions can be used for these.