Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2015

Performance Optimization II: Locality, Communication, & Contention

Previous | Next --- Slide 18 of 47

Jing

I think the arithmetic intensity sometimes is not a good metric for what we are doing in our program in terms of performance. Because it does not take into account the amount of work for computation and the time for communication. I believe it makes more sense if we multiply the numerator by another factor, which represents the amount of computation for each cell in our problem, and also we should multiply another factor in the denominator that captures the notion of communication time. Two solutions having the same arithmetic number does not mean they have the same performance. So why would we ignore those to make this metric misleading ?

russt17

@Jing

Yes I think this case is a special one where the arithmetic intensity is directly useful. Since both solutions have the same number of computations, having higher arithmetic intensity is equivalent to having a shorter run-time, since the only difference has to be due to less communication. Anyway, agreeing with what you're saying we should have:

Total comp time = time_per_comp*(#comps) + time_per_comm*(#comms)

                = (#comps)*(time_per_comp + time_per_comm/AI)

Now most of the time time_per_comm is way bigger than time_per_comp. If our AI is small then time_per_comp is insignificant and

Total comp time = (#comps)*(epsilon + time_per_comm/AI)

Now this illustrates that if increasing the #comps by some factor increases AI by a strictly larger factor, you should probably do it. So you could say something like "check it out, I had to double the amount of computations, but the arithmetic intensity went up by 2.3 times!" and this would be a good thing because of the assumption/approximation above.

But I agree, changing the AI alone isn't meaningful. For example you could add a bunch of useless computations to increase the AI indefinitely, but the # of computations would grow at the same rate and you'd get hit by the time_per_comp which would no longer be negligible.

aznshodan

can someone explain to me why the elements communicated is N/(P^(1/2))? And why is arithmetic intensity same as elements communicated?

paluri

@aznshodan To answer your questions:

1) The reason each of the number of elements communicated per processor is c * N/(P^(1/2)), where c is a constant, is because the total number of elements is N^2, so each row and each column is size N. Assume we are trying to share all the top and bottom rows and the left and right columns WITHIN a square, with the proper surrounding processors, so the constant c is 4. So what is the size of a row (or equivalently, a column) WITHIN a processor? It is simply (number of elements in a row / number of processors in that row), which is N/(P^(1/2)). I hope that makes sense.

2) The answer to this question is just algebra. Arithmetic intensity = number of computations / number of communicated elements. Which is (N^2 / P) / (N/(P^(1/2))) = (N^2 / P) * ((P^(1/2)) / N) = (N * N * P^(1/2)) / (N * P) = (N * P^(1/2))/(P^(1/2) * P^(1/2)) = N / P^(1/2). QED.