Question: Notice at this point thread blocks 1, 3, 4, and 5 are running on the GPU concurrently. (We've started running block 5 before blocks 3 and 4 have completed.) Is this correct? Why?
BigFish
It is correct. Thread in block 1 may suffer more from divergent execution so that it takes more time to finish. In addition, the block scheduler can put blocks into execution in any order.
yuel1
This is correct because in the cuda abstraction, blocks have to be independent from each other.
Question: Notice at this point thread blocks 1, 3, 4, and 5 are running on the GPU concurrently. (We've started running block 5 before blocks 3 and 4 have completed.) Is this correct? Why?
It is correct. Thread in block 1 may suffer more from divergent execution so that it takes more time to finish. In addition, the block scheduler can put blocks into execution in any order.
This is correct because in the cuda abstraction, blocks have to be independent from each other.