Previous | Next --- Slide 57 of 62
Back to Lecture Thumbnails
yihuaz

I have a question about the "Threads in thread-block are concurrent". So if a CUDA thread block spans over 4 warps, and we distribute work unevenly to these CUDA threads, will some warps have to wait for the other warps?

zhanif

I believe so, yes. What i gather from the discussion on earlier slides is that threads in a warp run like the instances of an ISPC gang, completing at the same time. However, the same cannot be assumed about multiple warps. I think what the statement means is that threads in a block are assumed to be independent.

jon

Threads in a warp are executed in SPMD style (which is why we write one CUDA kernel per block, not per thread). So the only real source of load imbalance would be conditionals (we discussed this from an ISPC perspective here: http://15418.courses.cs.cmu.edu/spring2014/lecture/basicarch/slide_028). So threads in a warp can't really finish before other threads, but the same issues from ISPC SIMD apply here.

aew

The main point from the comments above is that threads in a warp will all finish at the same time, similar to the instances of an ISPC gang. This could lead to worse performance than expected due to conditional execution. A thread block is similar to a ISPC gang, because in both their threads/instances run concurrently. This does not imply anything about multiple warps finishing at the same time.