I don't really understand the point of the while loop and thread synchronization. Isn't each block responsible for exactly 1 / (15 * 12) of the array, so wouldn't each block exit after the first iteration of the while loop?
kk
The synchronization is intended to synchronize the threads in the same block, since they may run at different speed.
For example, the first call to _syncthreads() is to ensure no threads proceed to performing the task until startingIndex has been computed.
Is there a way to access the hardware information at runtime (to eliminate the predefined constants
THREADS_PER_BLK
, andBLOCKS_PER_CHIP
)?@Elias You can get some information about the device using
cudaGetDeviceProperties
.http://developer.download.nvidia.com/compute/cuda/4_1/rel/toolkit/docs/online/group__CUDART__DEVICE_g5aa4f47938af8276f08074d09b7d520c.html
I don't really understand the point of the while loop and thread synchronization. Isn't each block responsible for exactly 1 / (15 * 12) of the array, so wouldn't each block exit after the first iteration of the while loop?
The synchronization is intended to synchronize the threads in the same block, since they may run at different speed.
For example, the first call to
_syncthreads()
is to ensure no threads proceed to performing the task untilstartingIndex
has been computed.