Previous | Next --- Slide 7 of 48
Back to Lecture Thumbnails

I am not sure about the definition of S.

If a program has following dependencies: 1.S->A->B->C->D->T and 2.S->E->T. Start from S and ends at T, then how we can define the fraction of sequential execution? Is it the (longest critical path) / (sum of length of all paths)? In this case it is 5/(5+2)?


@Fantasy. I think you are correct. $s$ is the span divided by all instructions to be carried out, which is the sum of length of all paths in your example.


The derivation of Amdahl's law is sort of described in the next few slides, but I don't think it's totally clear, so I'll summarize it here.

Let T(n) denote the time to execute your favorite program on an n-processor machine. Let's just define T(1) = 1. And we see that the speedup of using an n-processor machine is T(1)/T(n). We have T(n) = [stuff that must be done sequentially] + [stuff that can be done in parallel divided over all n processors] = S + (1-S)/n. Thus speedup = T(1)/T(n) = 1/(S + (1-S)/n) <= 1/S.


For the question of @Fantasy, if S,A,B,C,D,E,T can be executed parallel respectively, the fraction of sequential execution is 0.

Let's think about another scenario, if S, A, B, C, D, E, T are all single instructions, and S->A means instruction A need the output of instruction S. So anyone know what is the number of S with the dependencies S->A->B->C->D->T and S->E->T ?

I think two part of (A B C D) and (T) can be execute at the same time, but it's hard to find out the number of S.


@Fantasy @Araina Hi, I suppose the "inherently sequential" here refers to that a part itself should be sequential, rather than that dependency exists between different parts. Concretely, say a program has two parts, A and B, and A depends on the result of B, i.e. A -> B as the notation above. Considering the case that A and B both can be parallelized. With a zillion cores, we can reduce the overall run time to 0, and thus S = 0. Hence, the dependency between A and B does not matter as long as the execution time of the dependent part can be ignored.


The easiest way to think about it that the limit of total execution time as the execution time of parallelized code approaches zero is the execution time of sequential code. Thus speed up 1/s.


To add to what @thomasts and @PandaX said, another way to think about maximum possible speed up is - if we have an infinite number of processors, then the time taken by n processors to finish the parallel part of the execution = $ {(1 - S)\over n}$, (n -> $ \infty $), is infinitesimally small (~ 0). And so the only time that matters is the time taken to finish the sequential part of the execution and so the speed up is $ {((1 - S) + S)\over (S + 0)} $ which is, $ {1\over S} $.