Slide View : Parallel Computer Architecture and Programming : 15-418/618 Fall 2016

Previous | Next --- Slide 42 of 78

taoy1

Why is $ cudaMalloc $ passing a double pointer (&deviceA;) instead of using a function prototype like C malloc: $ void *malloc$(size_t size)$; $ ?

As explained in StackOverFlow:

Instead of returning a pointer to the allocated space (like C malloc), cudaMalloc returns a CUDA error code. So it has to return the allocation result in the first parameter. And the common way to do this in C is just: 1. declare a variable 2. pass its pointer to the function 3. the function change the value.

We can also write a wrapper function for cudaMalloc like this: $$ void *CudaMalloc(size_t len) { void *p; if (cudaMalloc(&p, len) == success_code) return p; return 0; } $$

amolakn

I was curious about something in class. How expensive is this memory copying? I feel like it would be a significant bottleneck. Is there such a thing for the GPU and CPU to have shared memory address spaces?

flxpox

Rather than passing pointer, cudaMemcpy is more like a message that imforms the GPU to allocate the bytes of memory in the GPU device memory space.

ferozenaina

As prof said, CUDAMemCpy is closer to passing a message with the data content to GPU. This operation is pretty expensive and the bandwidth is limited by the PCI bus.

I don't think GPU and CPU can have a shared address space. However, the GPU and other devices on the PCI bus could - http://docs.nvidia.com/cuda/gpudirect-rdma/#axzz4LxQMnlAq.