Just out of curiosity, what could be the reason for CUDA not to have grid-level of synchronization ? It seems quite helpful it we have that for the implementation of this algorithm.
Just out of curiosity, what could be the reason for CUDA not to have grid-level of synchronization ? It seems quite helpful it we have that for the implementation of this algorithm.