Slide View : 15-418/618 Spring 2014

Heterogeneous Parallelism and Hardware Specialization

Previous | Next --- Slide 9 of 40

Back to Lecture Thumbnails

yrkumar

Question: If we were to design a scheduling algorithm for heterogeneous processors, then presumably the programmer would have to specify that specific threads would require "fatter" or "thinner" cores. How would this translate to pieces of code running on devices with different hardwares (i.e. different numbers of fat/thin cores and various fixed function units)?

This comment was marked helpful 0 times.

sbly

That's not necessarily true. If we had a pool of worker threads that each took a task from some shared queue, it's plausible we could know the size of the task without the programmer's intervention. If the task had to read a file then we could estimate the time for the task by the size of the file, or if it had to compute something in a loop, then the runtime would probably be proportional to the number of loop iterations. Then the scheduling algorithms would be able to use this information to make smarter decisions. Also, scheduling algorithms don't have to know the size of jobs. These are called "oblivious" scheduling algorithms. Examples include First-Come-First-Served, Random, Processor Sharing, and Foreground-Background scheduling. This last one estimates the size of a job not based on any information about the job itself, but only on how long the job has been running so far.

This comment was marked helpful 0 times.

jmnash

One example of a processor that could benefit from heterogeneous resources is one that is made to run physics simulations, such as those for particle physics. Some parts may parallelize well because many particles may be affected at one time, but sometimes there might be a chain reaction where only a few particles are being affected at each time step, which would not parallelize well. Also, sometimes certain particles may require more computation than others to determine their new state, whereas other times all particles are effected equally, which effects the usefulness of SIMD execution. For the last point, at the beginning the particles next to each other will probably be in memory next to each other, which makes data access predictable when a force is acting on one area of the particles. However, after they are scattered, there is no guarantee that the particles in one area of the simulation will be near each other in memory, causing unpredictable data access.

This comment was marked helpful 0 times.

arjunh

Heterogenous computing is more than just using a mixture of CPU's and GPU's to solve a problem. Currently, CPU's and GPU's don't mix too well together, as they tend to operate in different environments entirely.

Communication necessitates expensive memory copying between the CPU and the GPU, which limits the usefulness of the GPU (it is only beneficial when the cost workload is significantly higher than the cost of communication, as we saw when implementing a CUDA-based scan in assignment 2; for relatively smaller values of n, the serial version was actually substantially faster than the CUDA version).

This comment was marked helpful 0 times.

ruoyul

To summarize, the motivation for a heterogenous mixture of resources is because most real-world application have complicated and changing work loads. Having heterogeneous resources ensures we have the right tool for the job when the situation changes.

This comment was marked helpful 0 times.