Previous | Next --- Slide 26 of 54
Back to Lecture Thumbnails
jpaulson

The # of threads launched is not necessarily equal to the # of data points to process; the programmer is responsible for managing the assignment of work (e.g., elements of an array) to CUDA threads.

Mayank

As pointed out by someone in the lecture, Nx/threadsPerBlock.x should be rounded up (Nx/threadsPerBlock + 1) and similarly for the y-dimension of blocks. Hence, each block has 4x3 threads (the third dimension is always 1) and there is a grid of 3x2X1 blocks. To elaborate @jpaulson's point, in this example, the total number of threads launched are 12 * 6 = 72. But only 11*5 = 55 threads would actually be doing useful work.

LeeK

Just to elaborate on the last 2 posts, there are 2x3 = 6 blocks and 4x3 = 12 threads in each block for a total of 72 threads. The matrices being added are 11x5. The way matrixAdd is assigning threads to indices of the array, some threads are being assigned indices that are out of bounds of the array so the if statement is necessary to prevent out of bounds array access.