Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2015

Previous | Next --- Slide 51 of 54

arjunh

Anyone want to try to answer this question? For reference, atomicAdd(int* x, val) will add val to the value stored at x and will return the old value stored at x.

mangocourage

One outcome is if block N is scheduled before block 0 so the flag is never set. Then the gpu will never finish running block N.

ericwang

Because we only have one core for one block at a time, there are 2 possible schedules:

atomicAdd in block 0 executes before atomicAdd in block N.
atomicAdd in block N executes before atomicAdd in block 0.

In the first case, myFlag is increased. When block N starts, the loop will break immediately. So "do stuff" in block N will be done.

In the second case, the loop will never break as @mangocourage pointed out. So "do stuff" in both blocks will never be done.

If we have 2 cores or the core can support more than one block, this problem does not exist. Right?

Olorin

This is a very different execution model to pthreads, as I understand it. In a pthreads-like model, you'll have a context switch to a different thread every so often, but it seems like here we don't switch away from executing a block until after it completes. This seems like it could be a problem if the slowest block we have happens to get scheduled first -- then all of the other blocks will finish much later than they would otherwise. Is there any way to get CUDA to switch threads every so often?

Perhaps I'm not quite understanding the block abstraction here (and this question might show that). If that's the case, can someone clarify?

regi

@Olorin: If I understand correctly, context switching can happen if a core has enough warps and shared memory for multiple blocks (for example, convolve). Usually GPUs should have enough cores to avoid this. However, given only one core with enough resources for a single block, the scenario you mentioned (the slowest block scheduled first) is possible.

jiajunbl

I think the question that we're looking at here is whether there is a timer triggered context switch like in a unix operating system. I asked a TA and it seems like the answer is no. So it seems like the context switch is only triggered on a blocking calls.