Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2015

Previous | Next --- Slide 52 of 54

Elias

Is there a way to access the hardware information at runtime (to eliminate the predefined constants THREADS_PER_BLK, and BLOCKS_PER_CHIP)?

cube

@Elias You can get some information about the device using cudaGetDeviceProperties.

http://developer.download.nvidia.com/compute/cuda/4_1/rel/toolkit/docs/online/group__CUDART__DEVICE_g5aa4f47938af8276f08074d09b7d520c.html

ananyak

I don't really understand the point of the while loop and thread synchronization. Isn't each block responsible for exactly 1 / (15 * 12) of the array, so wouldn't each block exit after the first iteration of the while loop?

The synchronization is intended to synchronize the threads in the same block, since they may run at different speed.

For example, the first call to _syncthreads() is to ensure no threads proceed to performing the task until startingIndex has been computed.