This lecture talked about three types of assignments. The first is static assignment where the work given to the threads is pre-determined. Static assignment is best suited for when we know the cost and amount of work ahead of time. The second is semi-static assignment, which combines static and dynamic assignments by dynamically updating a static assignment. Semi-static assignment is used when we can predict the future work based off of past results. The third type of assignment is dynamic assignment. In dynamic assignment, the costs or number of tasks are unpredictable, tasks are assigned at runtime by the program.
Static assignment is a method of dividing up work in a parallel program where the work assigned to each thread is determined before runtime.
The biggest advantage for this type of assignment is there is almost no runtime overhead. This type of assignment is good for programs where the workload is predictable. Static assignment works best when full knowledge about the workload of a program is known.
However, static assignment is terrible for programs where the workload is unpredictable. Once the workload is divided during the static assignment stage, it cannot be changed. Thus, if an unexpected event were to happen, the program cannot respond to it effectively. Also, if there were to be an error in the division of work, the program cannot fix itself. Static assignment is limited to programs where full knowledge of its workload is known. If that requirement cannot be satisfied, semi-static assignment and dynamic assignment are good alternatives to parallelize programs.
A good example for static assignment is the mandlebrot image in homework 1 (see Figure 1). Since the workload of the image is fully known, static assignment is an excellent choice for rendering this image in parallel.
Semi-Static assignment combines both static and dynamic assignments by dynamically adjusting a static assignment when necessary. Ideally, semi-static assignment would be used for an unpredictable task that can be broken up into predictable chunks where the assignment of these predictable chunks can be determined from past data. These predictable chunks are computed using a static assignment and when each predictable chunk is completed, the static assignment is rearranged dynamically based on the new data. By adjusting the static assignment, the work distribution will be more even, and by adjusting it only when necessary, the computation overhead of reassigning work is reduced.
For example, a particle simulation (as seen in Figure 2) where the particles do not move quickly. This simulation takes an input of particles and their velocites and other attributes and outputs their position each timestep. Between each timestep, the particles do not move very much, so redistributing the particles every 1000 time steps would be sufficient. Under a static assignment, the particles would never be reassigned and as particles moved the assignment of particles to processors would essentially become randomized. A dynamic assignment, however, would waste time assigning particles that don't need reassignment. It is important to note that this is only because the simulation can be broken into predictable chunks. If the particles are fast moving or completely random, the work distribution would quickly become uneven and the dynamic reassignments would lose its purpose.
Another example is an adaptive mesh as seen in Figure 3. This simulation shows an airplane wing and takes an input of the wing's shape and the wind source. It outputs which areas around the wing have the most turbulance every timestep. In an adaptive mesh, we do not know which parts of the mesh will take the most computation power until we run the simulation making the simulation. As we do not know which parts of the mesh are going to take the longest to compute, a static assignment will most likely lead to an unbalanced work distribution. On the other hand, a dynamic assignment will have a large work overhead as adaptive meshes are usually made up of many vertices. However, like the particle simulation, we can easily break the adaptive mesh simulation into smaller predictable chunks.
Assignment is determined at runtime by the program itself, because the number of tasks is unpredictable and cannot be distributed beforehand. The dynamic assignment model here uses a shared work queue (Figure 4), which is used to list all the tasks that need to be done. Worker threads will pull tasks from the queue to do, and when they are done with their current task they can pull another from the queue. So, worker threads never idle until the work queue is empty, and never have to wait for another thread to finish.
A problem with this implementation however, is that when there are critical regions where the program is sequential, only one processor can spend time in that critical region, and when all the critical regions are added together its the total time the programs spends in serial execution. This distribution can be seen in Figure 5, where the black regions are the critical regions and the blue regions are the time spent on the tasks.
However, if the time it takes to process a critical region is short, and the actual task itself takes a long time, then this does not affect performance very much at all. The real issue is how to balance between task granularity and decreasing synchronization overhead. We want to have more tasks so that it gives a good workload for the system and the flexibility to use this assignment, but we also want fewer tasks to minimize the overhead of synchronization. One way to get around to this is to have multiple work queues (Figure 6), one for each thread, where each thread pushes and pulls from its own work queue.
When a thread's queue is empty, they can pull work from other threads. This technique has high locality and low contention, which means there is low overhead. Another way is to have smarter task scheduling, where some threads have a one large task, and other threads have many smaller tasks to balance out the workload.
- What are the pros and cons of each type of assignment?
- Name an example program for each assignment and explain why that assignment is best. Try to think of one that has not been provided already.
- What are other ways we can distribute workload in dynamic assignment?
- Are there any cases where static assignment is bad even when we have full knowledge of a program's workload?