Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

Previous | Next --- Slide 24 of 79

kidz

I did not know that compilers were capable of interpreting independency in code to automatically generate parallel code. What is the difficulty of this problem? What kind of inference does a compiler make to figure out what components of a program are dependent/independent?

rmanne

There is a way to determine whether a certain index of an array is modified or not. This isn't always easy, but if the number of transformations on the induction variables are fairly small, then it becomes easy to determine whether or not parts of the loop are dependent. There's also the Single Static Assignment form that compilers often use in their middleend to do certain optimizations, and in this form, it becomes clearer whether or not variables within a loop are modified and their modification is used in the next iteration of the loop (phi nodes of the SSA form). This is usually either a case of dead code or loop invariants.

Once all of this information is accquired, if it's easy to parallelize with this information, then you can. If it's not because of dependencies, some compilers will try fancy loop transformations like loop skewing. You can imagine that all of this takes a lot of time and effort. I can't give you an exact quantification for how hard the problem is, but getting information usually isn't hard, but you might need to get a lot of it for some problems.

memebryant

There's some information on what LLVM does under their vectorization reference. If you're interested in how it's actually implemented, LLVM is both open source and fairly straightforward to read.

maxdecmeridius

@kidz I think the "forall" keyword here indicates that the code is independent. The programmer is telling the compiler that each iteration of this loop is independent, the compiler isn't figuring this out itself.

rmanne

@maxdecmeridius I'm not sure that this is completely the case.

GCC does vectorization (given you allow it to, by setting mtune or march to something other than generic) if it finds that the body of the loops is parallelism. It's not an ISPC thing, though I'm assuming ISPC is a safer and more consistent way of doing it (it seems to have a 'const varying' type that isn't available in most other languages that aren't concerned with parallelism like C). If you want to make sure that GCC really does do vectorization:

void f(int *x, int *y) {
  for (int i = 0; i < 1000000; i++)
    y[i] *= x[i];
}

And compile it with gcc -S -O3 -march=native f.c Open up the assembly file and search for ymm or xmm. There, GCC did some vectorization for you.

edit: the previous one is pretty unreadable due to the extra optimizations done at -O3, just switch on -O2 -ftree-vectorize and it should be pretty neat and a total of about 32 lines.