If we wanted to think of CUDA in ISPC terms, blocks are similar to tasks and then each thread is similar to an individual SIMD lane.
eosofsky
To add on to @apadwekar's comment, a CUDA warp is similar to an ISPC gang (each CUDA thread in a warp is running in a SIMD lane, much like an instance in an ISPC gang).
username
So I know that the "shared" keyword is used to specify that memory being allocated is being allocated for the block to share, but how would you specify whether you want to allocate memory on a per thread basis versus a global basis.
o_o
So, just summarizing thread hierarchy. Threads run a kernel execution for an index. Warps are groups of threads that run the same instructions. Thread blocks are groups of warps that have access to the same shared memory. Is this correct?
If we wanted to think of CUDA in ISPC terms, blocks are similar to tasks and then each thread is similar to an individual SIMD lane.
To add on to @apadwekar's comment, a CUDA warp is similar to an ISPC gang (each CUDA thread in a warp is running in a SIMD lane, much like an instance in an ISPC gang).
So I know that the "shared" keyword is used to specify that memory being allocated is being allocated for the block to share, but how would you specify whether you want to allocate memory on a per thread basis versus a global basis.
So, just summarizing thread hierarchy. Threads run a kernel execution for an index. Warps are groups of threads that run the same instructions. Thread blocks are groups of warps that have access to the same shared memory. Is this correct?