Previous | Next --- Slide 25 of 54
Back to Lecture Thumbnails
tliao

Question: Does the "Host" code and the "Device" code run on separate address spaces? For the saxpy part of assignment 2, I tried to run the code without having done a cudaMemcpy which immediately segfaults which would seem to suggest so.

kayvonf

Yes, you should consider host addresses and device addresses to reside in different address spaces.

I don't believe your code should SEGFAULT provided the device global memory accessed by the CUDA threads is properly allocated. Leaving out the cudaMemcpy should simply skip the transfer of data from the host to the device address spaces. Can you re-check? (And we can potentially take the discussion over the Piazza if there are issues.)

bottie

A summary about the CUDA function declarations:

Executed on Only callable from
__device__ float DeviceFunction() device device
__global__ void KernelFunction() device host
__host__ float HostFunction() host host

the kernel function is the function that really launch the parallel execution. Note: a kernel function must return void.

I think this will help to understand the host and device code run in different space address.

sfackler

A function declared without any of __device__, __global__ and __host__ is equivalent to that function declared with only __host__.

jcmacdon

It is also interesting to note that __global__ functions can now be called from the device on devices that have compute capability 3.x (according to the CUDA C Programming Guide).

xs33

Question: Is it bad style to call malloc from a Cuda kernel? Are there any situations where calling malloc would even be encouraged?

kfc9001

@xs33 I'd imagine it would be bad. All the threads in a single warp would start executing malloc in lockstep. However, if you're talking about a device specific malloc that allocates memory out of each thread's local storage, that might be more feasible.

The safer way to do this would obviously be to malloc each thread's storage up front, and then parcel it out to each thread as it runs.

aawright

If you want to write a function that is accessible on both host and device (you might do this for a function like max) you can do this by adding both __host__ and __device__ to the function declaration.