Does the blue SIMD load/store unit always take a warp context, since every warp needs to load and store data? Or is there a hidden context (and decoder) for only this purpose?
In the convolve program, we are explicitly loading the data first into the shared block memory and this shared memory is implemented by dividing the L1 cache on this picture, right? Doesn't the first read automatically loads the first line of data into the cache? Why do we need to explicitly load the data if we are already using a cache?
This comment was marked helpful 0 times.
spjames
If anybody is interested, the whitepaper on the GK104 (GTX 680/Kepler) chip goes into extensive detail of how the chip is structured, how warps are scheduled, what the memory bus/regions look like, and more.
The newer GK110 (GTX 780/Kepler) paper is not about the exact chip we are discussing, but goes into even more detail about how everything works (and describes some cool new features), and GK104 and GK110 are pretty similar.
Finally, if you're interested in history, the original Fermi (8800 GTX) whitepaper is pretty interesting. Fermi was the very first NVIDIA architecture that had a unified shader architecture, which allowed for complex programmable shaders (vertex, fragment, geometry, etc.) as well as CUDA.
Question:
Does the blue SIMD load/store unit always take a warp context, since every warp needs to load and store data? Or is there a hidden context (and decoder) for only this purpose?
In the convolve program, we are explicitly loading the data first into the shared block memory and this shared memory is implemented by dividing the L1 cache on this picture, right? Doesn't the first read automatically loads the first line of data into the cache? Why do we need to explicitly load the data if we are already using a cache?
This comment was marked helpful 0 times.
If anybody is interested, the whitepaper on the GK104 (GTX 680/Kepler) chip goes into extensive detail of how the chip is structured, how warps are scheduled, what the memory bus/regions look like, and more.
This is the paper on the GK104:
http://www.geforce.com/Active/en_US/en_US/pdf/GeForce-GTX-680-Whitepaper-FINAL.pdf
The newer GK110 (GTX 780/Kepler) paper is not about the exact chip we are discussing, but goes into even more detail about how everything works (and describes some cool new features), and GK104 and GK110 are pretty similar.
http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf
Finally, if you're interested in history, the original Fermi (8800 GTX) whitepaper is pretty interesting. Fermi was the very first NVIDIA architecture that had a unified shader architecture, which allowed for complex programmable shaders (vertex, fragment, geometry, etc.) as well as CUDA.
http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIAFermiComputeArchitectureWhitepaper.pdf
This comment was marked helpful 3 times.