Previous | Next --- Slide 44 of 54
Back to Lecture Thumbnails
grose

Where does the number 128 here come from? (1st line under "In the case of convolve()")

azeng

In the source code for convolve(), the number of threads per block was defined as 128 in preprocessor directives. It's 128 since we can run four warp execution contexts at the same time, each with 32 threads for a total of 128 threads.

kayvonf

@azeng. Your comment was correct that 128 came from the program. But I do want to clarify that the SMX cores in this specific GPU can run up to 2048 threads concurrently (64 warps worth). The number 128 in the example code is a bit arbitrary, but clearly it's a good idea for the number of threads in a CUDA thread block to be a multiple of the warp size. Given our implementation of the code, this GPUs could run multiple thread blocks of our convolve problem concurrently.