Is this the part where the User Mode Drivers for GPUs come into picture? Do they expose some APIs which can be used for synchronization, say if we are not using pThreads?
We discussed in lecture that decomposition is done by the programmer, and that assignment can be done either by the programmer or the compiler. Who is responsible for the orchestration aspect? Would it be the kernel?
It is my understanding that this synchronization is usually done by the kernel.
I agree with @pkoenig since anything which helps the orchestration done by the programmer would have to be part of the "assigning" process.
When the kernel goes through orchestration is there explicit ways to tell the kernel how long to orchestrate or to pass it information to help it orchestrate or does the kernel make these decisions based on its own information and its own architecture? How deep does the orchestration go in its attempt to reduce costs of communication and synchronization?