I'm a bit confused about how this code works. As I see it, the "startingIndex" can be as high as (N-1) while the main loop is running. This means that "index" can be as high as (N-1) + (127) = N+126, but isn't this out of bounds?
This comment was marked helpful 0 times.
iamk_d___g
@kkz: I think this code makes the assumption that N is a multiple of 128.
This comment was marked helpful 0 times.
kayvonf
Yeah, it does. It would have become pretty messy if I put a boundary condition for that in there.
This comment was marked helpful 0 times.
bwasti
If CUDA was created to allow general computation on the GPU, why are the semantics block based and not more similar to the semantics employed by this program? It seems like the abstraction level is slightly contrived. You could potentially have multiple blocks on the same core, right? This seems weird.
I'm a bit confused about how this code works. As I see it, the "startingIndex" can be as high as (N-1) while the main loop is running. This means that "index" can be as high as (N-1) + (127) = N+126, but isn't this out of bounds?
This comment was marked helpful 0 times.
@kkz: I think this code makes the assumption that N is a multiple of 128.
This comment was marked helpful 0 times.
Yeah, it does. It would have become pretty messy if I put a boundary condition for that in there.
This comment was marked helpful 0 times.
If CUDA was created to allow general computation on the GPU, why are the semantics block based and not more similar to the semantics employed by this program? It seems like the abstraction level is slightly contrived. You could potentially have multiple blocks on the same core, right? This seems weird.
This comment was marked helpful 0 times.