Previous | Next --- Slide 19 of 35
Back to Lecture Thumbnails
gbarboza

MC scaling: Since the memory usage per processor is fixed, we have the nice property that communication per computation decreases as both N and P increase. By looking at the previous slide, this becomes clear if we think about dividing the 2D grid ever more and sticking more dots inside each cell, and then imagining how much communication needs to occur as each of these increase. Essentially, each cell becomes more isolated in its work and so we see the best speedup.

sjoyner

Communication overheads limits problem constrained scaling since as you break the grid into more subproblem, the number of grid blocks increases leading to a higher chance that an element will be on a block boundary.

mschervi

I am trying to understand how some of these values were computed. My understanding (for the time constrained scaling example) is:

  1. Execution time is fixed at $O(n^3)$ by definition.

  2. The total size of the grid $k \times k$ for some k

  3. We need to calculate k. We assume that we will get a speedup linear in the amount of processors that we have, so we need $k^3/P$ to be $n^3$. This gives us that $k=nP^{1/3}$

  4. Each processor is responsible for 1/P of the total grid. This is $k^2/P$ elements. Substituting the formula for k from step 3 we have that each processor is responsible for computing $n^2/P^{1/3}$ elements. This is the amount of memory that each processor will use.

  5. Communication to computation ratio = elements communicated/ elements computed. Elements computed is $n^2/P^{1/3}$ from line 4. Elements communicated is $k/\sqrt{P}$ since k is the size of the whole grid. So the ratio is $(k/\sqrt{P})/(n^2/P^{1/3})$ $\ = k*P^{1/3}/n^2 \sqrt{P} $ $\ = nP^{1/3}P^{1/3}/n^2 \sqrt{P} $ $\ = P^{1/6}/n$

Question Why was n left out of the Big O for comm-to-comp ratio in TC scaling but included in MC scaling?

martin

I'm confused about the commu-to-comp ratio in MC scaling. computed element per processor should be NNP / P = N^2 communicated element per processor should be N*(P^1/2)/(P^1/2) = N And the ratio is N/N^2 = 1/N. Can someone help me out here?

xiaowend

@martin: In my opinion, by definition, total work is (NP^1/2)^3 = (N^3)(P^3/2), so work for each processor is (N^3)(P^3/2)/P = (N^3)(P^1/2), so comm-to-comp should be N(N*P^1/2)/(N^3)(P^1/2) = 1/N

EDIT: @acappiello You are right, but comm-to-comp you compute should be (N^2)(P^1/2) / (N^3)(P^1/2) = 1/N, which is consistent with what @martin gets.

EDIT2: I corrected the mistake.

acappiello

@mschervi I believe that your analysis is totally correct. Since n is a constant, we can ignore it when writing the big O, but I'm not sure why there is a discrepancy.

@martin I'm having trouble with this too. Kayvon says that we're doing a lot more work, but communicating the exact same amount (video 46:00), which isn't totally true. We have more blocks in the grid as P increases, so communication goes up with P.

@xiaowend I don't think that's totally correct. Communication has to occur after each convergence iteration. So, the amount of communication over the entire program execution is "communication per iteration" x "number of iterations" which is $NP \times NP^{1/2}$. Total work is given by $(NP^{1/2})^3$, as you've said. So, I get a comm-to-comp of $(N^2P^{3/2})/(N^3P^{3/2}) = 1/(N)$.

EDIT: I don't think the above is still totally right. Thinking...

EDIT2: Made some changes to the calculation.

nbarshay

Here is an argument for why the comm-to-comp ratio under MC scaling is constant. Note that the number of iterations is irrelevant for computing this ratio, because increasing iterations scales comm and comp by the same factor. As we MC scale, the size of the sub-grid seen by each processor remains constant. Thus the ratio between it's area (comp per iteration) and it's perimeter (comm per iteration) remains constant. Thus the comm-comp ratio is constant.

What is wrong with this argument?