Is there a particular rule why we define THREADS_PER_BLK as 128?
I think it's based on how the thread scheduling works?
This is just an example. CUDA Compute 6.1 can have a maximum of 1024 threads per block across all the three dimensions.
Is there a particular rule why we define THREADS_PER_BLK as 128?
I think it's based on how the thread scheduling works?
This is just an example. CUDA Compute 6.1 can have a maximum of 1024 threads per block across all the three dimensions.