What languages give you direct access to the cache line size? Is this something we can use for both C and CUDA?
tommywow
In the first method, all the per-thread variables are stored together, on the same line of cache. As a result, a lot of overhead is involved in ensuring cache coherence. In the second method, each line contains just one variable. Although it is less space efficient as it pads the rest of the cache line, it significantly reduces overhead, assuming only one processor access that line.
What languages give you direct access to the cache line size? Is this something we can use for both C and CUDA?
In the first method, all the per-thread variables are stored together, on the same line of cache. As a result, a lot of overhead is involved in ensuring cache coherence. In the second method, each line contains just one variable. Although it is less space efficient as it pads the rest of the cache line, it significantly reduces overhead, assuming only one processor access that line.