Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2015

kayvonf

Someone might want to describe this form of the Amdahl's Law equation for the class. (and compare it to the form on the previous slide.

Jing

I don't see why it is

f * r / (perf(r) * n)

It seems to me that f / (perf(r) * n) is more reasonable, that is, the total number of work that can be parallelized, divides by the total processing power (that is, perf(r) for each process times the number of processors we have). Why there is a r in the numerator ?

kayvonf

But we only have n/r cores each with perf(r). (We have n units of resources and r units for each core. So the time to the parallelizable work part of the equation is written like this:

f / (perf(r) * n / r)

The slide just rewrites the above as f * r / (perf(r) * n)

Jing

I think I misunderstood the 'n' as the number of cores we have when it is really the number of compute resources we have. Thanks !

ekr

If each core has performance perf(r), then why do the smaller cores here have performance equal to 1, while the larger cores have r = 4? Shouldn't that imply the smaller ones have r = 2, since we said they each have half the sequential processing power?

kayvonf

As a simplifying assumption, we declared that perf(r) = sqrt(r). So perf(1) = 1 and perf(4) = 2.

flyne

Numerator is the total work with 1 unit of resources, whose performance is assumed to be 1. This would be the total time taken). The denominator is the work in the sequential part/(performance per r resources) + the work in the parallel part/(performance per r resources * per n/r cores). This would be the total time taken in a heterogenous processor.

sanchuah

"n" is the total number of processing "resources". Resources can be defined in different context, here gives a sample definition as the number of transistor. "r" is the average "resources" of a core. Therefore, "n/r" equals to the number of cores.

The revised Amdahl's law describe speedup = 1 / [((serial part)/ performance of one core) + ((parallelized part) / sum of performance of all cores)]

landuo

Sequential part of work is executed by one core with r computing resource that has perf(r). Parallel work is executed by (n/r) cores that each has performance of perf(r). So the entire work which is denoted as "1" is done by combining both sequential and parallel parts