Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

qqkk

Where is 6 (the number of blocks?) specified in this piece of code? I only see threadsPerBlock(4,3,1).

Funky9000

@qqkk No. of blocks (6) = Nx/threadsPerBlock.x (12/4 = 3) * Ny/threadsPerBlock.y (6/3 = 2) * 1

I think.

efficiens

I think it is 6 threads each with a block of size 12 so 6*12 = 72 in total

blairwaldorf

If thread IDs can be up to 3-dimensional, and there are multiple threads to a block, that doesn't mean threads are subdivided themselves right? In this image of a block(1,1) with 12 threads, if the programmer specified threadsPerBlock to look like(2,2,3) instead, then the block image would be 3-dimensional. (With threads indexed as (1,1,1) with three elements, etc. instead)

pavelkang

So what is 1 in threadsPerBlock(4, 3, 1)? I understand that 4 is width of block, and 3 is height.

karima

@pavelkang CUDA supports three dimensional thread blocks (thus the dim3). We don't use the third dimension here so the value is 1.

Black

What's the relationship between threadsPerBlock and performance? How to choose the correct number of threads per block?

TomoA

@Black This is a good explanation for choosing the number of threads per block. In short, after considering the max number of threads you are allowed to put in a block, you should make the block size a multiple of the warp size, which makes sense since all the threads will be run simultaneously and the block will be split into different warps, and you should select a number which bests hides memory and instruction latency. It's similar to how we had to choose the optimal number of threads in the quiz to best hide the memory latency involved for each cache size and achieve maximum throughput.

This is a actually a question in the CUDA FAQ, which will link to this tool for measuring the GPU occupancy.

msfernan

Just clarifying.

Nx is the width of the grid in the x dimension. Ny is the height of the grid in the y dimension.

Therefore as @Funky9000 said:

 numblocks(Nx/threadsPerBlock.x, Ny/threadsPerBlock.y, 1)

can be written as:

 numblocks(size of grid in x/number of threads in x direction,
           size of grid in y/number of threads in y,
           1)

or

 numblocks(number of blocks in x direction,
           number of blocks in y direction,
           1)

which equals:

 numblocks(12/4 = 3, 6/3 = 2, 1)

418_touhenying

Since it can be up to 3D and the example is 2D, I guess that the "1" argument makes it so? Did I understand this correctly?

karima

@418_touhenying yes, we are not using the third dimension by setting it to 1.

maxdecmeridius

What exactly is a thread block?

kayvonf

@maxdecmeridius. What do you think? Why don't you give a definition a shot.

maxdecmeridius

@kayvonf Here's my thought: A thread block is an abstraction that we use to encompass a set of threads that can communicate with one another. Threads outside of the thread block are independent and cannot sync or communicate with threads within that block.

0xc0ffee

So we have two IDs, each of which is up to 3-dimensions. I think it makes most sense to think of block ID and thread ID as two completely separate IDs.

kayvonf

@maxdecmeridious: That's a nice definition to me.

If we want to be very precise, there are a few ways threads in different thread blocks in a single kernel launch can interact through CUDA's global memory address space, and the last two thought experiments in this lecture, (beginning on slide 75) get at which interactions are sound.