Previous | Next --- Slide 30 of 54
Back to Lecture Thumbnails
ghotz

Remember, the closer you get to the thread, the cheaper memory accesses are.

max

All threads can share memory if they are in the same block.

There is one instance of shared memory per block, while there is also one instance of local memory per thread, and one instance of global memory which is written to by all threads.

miko

Question:

Does the 'device global memory' in this diagram refer to the memory of the GPU? If so, does the GPU's memory actually behave in exactly the same manner as main memory does (perform caching and such) or are there some small discrepancies between the two?

kayvonf

Yes. In modern GPUs device global memory corresponds to high-performance DDR5 DRAM resident on the GPU board (but not on chip). You can think of this memory just as you think of main system memory accessible to a CPU (typically DDR3 these days). The GPU does cache part of this address space, although GPU caches tend to be smaller than those on a chip.

DanceWithDragon

Use a table to conclude this part. Suppose we have M blocks, and each block has N threads.

Memory type num of such memory Accessed by Size Speed
Device Global Memory 1 M*N threads Large Slow
Per-block shared memory M N threads in same block Medium Medium
Per-thread private memory M*N 1 threads Small Fast