Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2015

mingf

It is clear that there are at least two factors to affect the schedulability of cores.

The size of shared memory needed by the block. When a given core is not able to reserve enough shared memory for block, no more blocks can be scheduled to that core.
The size of execution context. The number of threads that the given core can hold at the same time. When there are not sufficient execution context to hold all threads of a block, the block cannot be scheduled to that core.

Question: Is there any other factors that could affect the schedulability of cores?

kk

If we have a NUMA processor, than the physical location of the core might be of interest to the scheduler.

doodooloo

If a core can run 2 warps at the same time, can it run the warps from different blocks at the same time, specifically: 1) if every block only consists of 32 threads (1 warp)? 2) if every block consists of 2 warps, but 1 of them finished first?

kayvonf

@mingf: In your post you mentioned the "schedulability of cores". Can you clarify what you meant here? Did you mean "how CUDA thread blocks can be scheduled" on a GPU's cores?

kayvonf

@doodooloo: Absolutely. The GTX 680 SMX core is able to run up 64 warps. These warps may come from multiple thread blocks. So yes, in your (1) case, two thread blocks, each mapping to one warp, could be run concurrently on the core. However, I did not understand what you meant in (2). Could you please clarify.

doodooloo

@kayvonf: in (2), I meant if each block maps to two warp and the core can only run two warps at the same time, can the core run 1 warp from each of the two blocks?

kayvonf

No it cannot. The SMX core will only run one thread block concurrently in this case. All the threads in a thread block are run concurrently on the same core. To better understand why, take a look at the thought experiment on slide 48 of the GPU programming lecture.

mingf

@kayvonf: By "schedulability of cores", I meant that a given core has the enough resources to be able to hold a block. Yes, it is better to ask how CUDA thread blocks can be scheduled.

admintio42

How are warps scheduled? For instance if my program uses 64 threads per block and my chip has 2 processors each with 32 wide SIMD and memory for 2 warp contexts of size 32 each, would my program run on both processors each using 32 wide SIMD or on one processor by running two warps?