Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2017

GPU Architecture and CUDA Programming

Previous | Next --- Slide 42 of 81

Back to Lecture Thumbnails

williamx

When we check whether if i and j are in bounds, why do we check the upper bounds but not the lower bounds?

ggm8

Can the programmer manually change numBlocks and threadsPerBlock to match the specifications of your program or is there a need for rounding of some kind?

muchanon

We don't need to check the lower bounds of i and j because the blockIdx, blockDim, and threadIdx values that determine them will always be positive. Additionally, I don't believe they will ever overflow and become negative because there probably is no situation in which you have that many blocks or threads.

pdp

blockIdx and threadIdx always start at 0 and hence no need for lower bound checks.

paracon

@ggm8, Along with considering the parallelism in the program, the numBlocks and the threadsPerBlock should be set in a manner that it maximises the compute compatibility of the GPU. The Cuda wikipedia page explains why threadsPerBlock should be a multiple of 32. There are also limits on maximum number of active blocks per processor, maximum number of active threads per processor, shared memory as well as register limitations (mentioned in the wikipedia page linked above). As CUDA is a very low-level language, we should considered these parameters while setting the numBlocks and threadsPerBlock to best utilise the GPU.

Abandon

Why in this example, the argument of CUDA for matrixAdd is "dim3" struct but in the later example of function convolve, the argument of CUDA is just a int. How can I tell what kind of argument needed when I am programming CUDA code?

manishj

@Abandon This link will resolve your query: http://stackoverflow.com/questions/2392250/understanding-cuda-grid-dimensions-block-dimensions-and-threads-organization-s/