Where is the compiled CUDA device binary stored? Is there any tradeoff between memory usage and read latency?
bharadwaj
From what I remember from OH a few weeks ago, CUDA device binaries are given their own separate buffer on the GPU (I believe there is one copy per block, though I'm not sure).
Not sure where the tradeoff you're suggesting would come from though.
sampathchanda
Isn't it 8 bytes, instead of 'B' bytes of local data per thread ?
kayvonf
CUDA device binaries are instructions in GPU memory that are loaded by the professor as part of instruction fetch. (This is the same regular CPU application binaries---which are bits in memory that the PC points to, and the CPU fetches.)
Where is the compiled CUDA device binary stored? Is there any tradeoff between memory usage and read latency?
From what I remember from OH a few weeks ago, CUDA device binaries are given their own separate buffer on the GPU (I believe there is one copy per block, though I'm not sure).
Not sure where the tradeoff you're suggesting would come from though.
Isn't it 8 bytes, instead of 'B' bytes of local data per thread ?
CUDA device binaries are instructions in GPU memory that are loaded by the professor as part of instruction fetch. (This is the same regular CPU application binaries---which are bits in memory that the PC points to, and the CPU fetches.)