Previous | Next --- Slide 32 of 42
Back to Lecture Thumbnails
iamk_d___g

Question: is R/W counted as a conflict, or they are using different wires?

mchoquet

Unless I'm missing something, all 32 threads do the same thing on each clock cycle, so that situation (some threads reading and some writing) can't arise.

mchoquet

Wait, that wasn't quite right. If this memory is shared across the chip then all currently running warps could access it. However, I don't think they can access it during the same clock tick since we only have 1 batch of 32 load/store units that all the warps have to share. I could be wrong again though.

kayvonf

@mchoquet. This is a GTX 480 specific example. You're referring to the GTX 680. For the sake of understanding this particular example of contention, consider a single banked on-chip memory (as illustrated above) and a single 32-wide warp trying to access it.

Shared memory address x is serviced by x mod 32. And each bank can service one request per cycle, unless all requests are to the exact same address (special case 1).

The overall point is that this is an example of contention playing a big role in the actual cost of an operation. 32 loads with no contention are completed in one cycle. If contention is present, the 32 loads could take much longer.

black

I really like this example. There are too many situations so that NVIDIA can't handle them all. They decide to have the special case that if all threads access the same bank, and discard the situation that all threads access two banks. Maybe bulk load's overhead is too high. And [3*index] doesn't have the problem because 3 is not a factor of 32, we are just lucky.