Slide View : Parallel Computer Architecture and Programming : 15-418/618 Fall 2016

Parallel Programming Abstractions (and their corresponding HW/SW implementations)

Previous | Next --- Slide 12 of 56

Nesuna

We agreed that when SIMD is being used, it was more optimal to use the interleaved method to allow for more efficient parallelism, so I was wondering when using blocked assignment is more optimal? Would locality be the only advantage?

BBB

When memory access is the major cost, the question is always how can we maximize locality within each core. Blocked assignment is more effective when assigning work to different cores, since each core has a smaller memory area to access. Meanwhile interleaving is effective when the work is distributed within a core, since all accesses for each instruction will be to sequential addresses.

We can improve on simply blocking or interleaving work by combining the two methods by assigning work to cores with block assignments, and then interleaving to subdivide the work within each core.

This assumes a fairly simple access pattern for the program, such as iterating over an array. More complex accesses patterns may have different impacts on what sort of work assignment will maximize locality.

neonachronism

The issue isn't really about access patterns, its about expressing an operation in a way the the complier can convert to SIMD instructions.

For the interleaved example, you have four "program instances", and the first thing they do is operate on memory cells 1, 2, 3, 4, respectively. The complier can see that four things need to happen first, and combine them into a single instruction.

In the blocked example, the first thing the instances do is operate on cells 1, 5, 9, 13. These can't be combined.