__global__ means a global function (kernel) that can be called from host side using CUDA kernel call. __device__ means the function can only be called by other device functions and global functions at the device side.
How is parallelism handled in device function? Assume there are more work than just 2*x in device function and those work is potentially parallelizable.
My guess is that it must be single threaded, and the thread is the one that executes global function.
How exactly are the CPU and GPU connected? Is it through a bus?
@totofufu I think CPU and GPU are connected via PCI bus
@totofufu adding to what yangwu has said, I guess GPU is connected via PCI bus, and PCI and CPU are connected through the north bridge
When writing CUDA for the devices and host, @haboric mentioned that global is called from host side and device is called from device and global functions. In the lab 2, we use functions from files that include .cu_inl. What are the benefits and disadvantages to using the .cu_inl for helper functions?
@acfeng I think that just has to do with the benefit of using inline functions (avoiding function call overhead) that are usually applied to very small functions that are called often.
For this slide, is the use of device float doubleValue redundant? Since we can just do the multiply directly in the global void matrixAddDoubleB ?