Slide View : 15-418 Spring 2013

GPU Architecture and CUDA Programming

Previous | Next --- Slide 23 of 54

Back to Lecture Thumbnails

jpaulson

Block groups are an implementation-level detail.

There is block-level shared memory and some synchronization operations at the per-block level.

This comment was marked helpful 0 times.

monster

Here both blocks and threads per block are in the 2-dimension array. Each time you should tell the number of blocks and threads per block and the total threads you use is the number of blocks times the number threads per block.

This comment was marked helpful 0 times.

Xelblade

threadIDx is akin to program index. blockDim is akin to program count. blockID is akin to task ID.

This comment was marked helpful 0 times.

xiaowend

One advantage of keeping block-level memory is that this kind of shared memory is cheaper.

This comment was marked helpful 0 times.

Tao

The shared memory is on-chip, smaller but faster. The idea might be similar to the cache in the memory hierarchy. Another benefit is that it allows the threads in a block to communicate via memory.

This comment was marked helpful 0 times.

kayvonf

@Tao: Yes, you can think about the shared memory like a cache since data in shared memory is resident in low-latency, on-chip storage. The primary difference between shared memory and a traditional cache is that a program has to manually load data into shared memory (software manages what data is stored in the shared memory). In contrast, a cache is largely transparent to software: a program just issues loads and stores and the hardware manages what data is stored in a cache.

Another name for the storage used to implement shared memory is a "scratchpad".

If you read more closely about the details of an NVIDIA GPU, you'll find that each core has a fixed amount of addressable on-chip storage. This storage can be dynamically configured into a scratchpad region that is used for CUDA shared memory allocations and a part that functions as an L1 cache for off-chip global memory. On most recent NVIDIA GPUs (certain true of the 480 and 6xx GPUs in the lab), there's 64 KB of storage per core, of which at least 16 KB must be reserved for shared memory.

This comment was marked helpful 1 times.