Slide View : 15-418 Spring 2013

GPU Architecture and CUDA Programming

Previous | Next --- Slide 30 of 54

Back to Lecture Thumbnails

ghotz

Remember, the closer you get to the thread, the cheaper memory accesses are.

This comment was marked helpful 0 times.

max

All threads can share memory if they are in the same block.

There is one instance of shared memory per block, while there is also one instance of local memory per thread, and one instance of global memory which is written to by all threads.

This comment was marked helpful 0 times.

miko

Question:

Does the 'device global memory' in this diagram refer to the memory of the GPU? If so, does the GPU's memory actually behave in exactly the same manner as main memory does (perform caching and such) or are there some small discrepancies between the two?

This comment was marked helpful 0 times.

kayvonf

Yes. In modern GPUs device global memory corresponds to high-performance DDR5 DRAM resident on the GPU board (but not on chip). You can think of this memory just as you think of main system memory accessible to a CPU (typically DDR3 these days). The GPU does cache part of this address space, although GPU caches tend to be smaller than those on a chip.

This comment was marked helpful 0 times.

DanceWithDragon

Use a table to conclude this part. Suppose we have M blocks, and each block has N threads.

Memory type	num of such memory	Accessed by	Size	Speed
Device Global Memory	1	M*N threads	Large	Slow
Per-block shared memory	M	N threads in same block	Medium	Medium
Per-thread private memory	M*N	1 threads	Small	Fast

This comment was marked helpful 0 times.