Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

Heterogeneous Parallelism and Hardware Specialization

Previous | Next --- Slide 7 of 47

lol

Derivation:

Let $T$ be the total amount of time to compute the entire work in sequential. Then $(1-f)$ fraction is sequential, $f$ fraction is parallel. So $(1-f)T$ of the time is sequential. We get a performance benefit of a factor of $perf(r)$ when using $r$ resources as opposed to $1$, so the runtime of the sequential part is $\frac{(1-f)T}{perf(r)}$. For the parallel portion, there are $\frac{n}{r}$ cores ($n$ total resources, $r$ per core). Hence each core does $\frac{fT}{\frac{n}{r} perf(r)}$ work, since its improved by the perf factor and split evenly among $\frac{n}{r}$ cores. So the runtime is $\frac{(1-f)T}{perf(r)} + \frac{fT}{\frac{n}{r} * perf(r)}$. Hence $speedup(f,n,r) = \frac{T}{\frac{(1-f)T}{perf(r)} + \frac{fT}{\frac{n}{r} * perf(r)}} = \frac{1}{\frac{1-f}{perf(r)} + \frac{f}{\frac{n}{r} * perf(r)}}$.

Note the actually the runtime using 1 normal core is technically $\frac{T}{perf(1)}$. But by convention, $perf(1) = 1$, so the result is the same.

intuitively, you can see that as the resources increase, the denominator is decreasing, increasing the potential speedup