Slide View : 15-418 Spring 2013

Parallel Programming Basics

Previous | Next --- Slide 6 of 40

Back to Lecture Thumbnails

jpaulson

Put another way, depth (from 15-150 and 15-210) is a lower bound on the execution time, just as work is an upper bound.

This comment was marked helpful 0 times.

nslobody

Can you explain depth for those of us that didn't take 15-150 or 15-210?

This comment was marked helpful 0 times.

martin

Consider summing 8 numbers using a tree. The depth would be 3 and the work would be 7.

This comment was marked helpful 0 times.

acappiello

15-150 defines span/depth as: An upper bound on the longest sequential part of the program, which gives us a lower bound on the overall running time assuming an unlimited supply of processors is available and we partition the work among as many processors as we need.

This is easiest to think about when looking at a particular computation as a tree. So, the example given above is a very specific example of this. Although each computation within any given row of a tree can be done independently (in parallel), computation between rows is sequential and thus a lower bound on runtime.

This comment was marked helpful 1 times.

jpaulson

Break your computation down into a series of steps (as fine-grained as you want). Draw an edge from step A to step B if B depends on A. Then depth is the length of the longest path in this graph (where we pay to go through vertices, and the cost of a vertex is the amount of work there).

Note that this graph is a DAG (what would a cycle mean?), and any valid topological sort is an order a sequential processor could do the work. You can maximize parallelism by always working on every vertex of in-degree 0 (imagine that when you finish the work at a vertex, it and its outgoing edges disappear).

This comment was marked helpful 2 times.

pebbled

@acappiello: I believe you mean lower bound on runtime ;) Here's an example computation tree, for the visually inclined:

    *
   / \
  +   -
 / \ / \
2  2 3  1

The work of computing the ((2+2)*(3-1)) is given by the number of nodes in the tree (3.. or 7 if you like to include the integer loads) while the span is given by the longest path from root to leaf (2 or 3 based on the same condition).

This comment was marked helpful 0 times.

acappiello

@pebbled Oops. Yeah, I made a mistake there. I meant to say that it's an upper bound on the longest sequential part of the code, which is effectively a lower bound on overall runtime. I'll edit that post.

This comment was marked helpful 0 times.

kayvonf

@acappiello: One further clarification: I wouldn't say span is a bound on the "longest sequential part of the program" since that definition makes it sound like a program with multiple sequential parts is only bounded by the longest of those parts. A better way to think about a program's span (a.k.a. its "depth"), is to consider the longest sequence of dependent operations in a program. As @pebbled says when referring to the example of an expression tree, the span is the longest path from the root of the expression tree to a leaf. Since none of these dependent operations can be carried out in parallel, no parallel program can finish faster than the time it takes to execute this sequence.

Clarification: I also what to point out that in this class theory and practice don't always meet up because theory does not model all the factors that contribute to performance in a real system. We've also learned about how caching effects in this class that might cause a parallel program to run with superlinear performance and thus seemingly outperform the bound set by span if wall clock time was measured. This doesn't violate the notion of span. It just means that the dependent sequence of instructions was somehow carried out faster in the parallel configuration than in the serial one, not because of parallelization of this sequence (which is impossible due to the dependencies), but because of some other system-level effect.

This comment was marked helpful 0 times.