Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2017

Previous | Next --- Slide 16 of 43

crow

The task of processing data from the LHC seems to be something which can be viewed as both memory and time constrained -- the LHC produces 300 GB of data per second. Most of this data is filtered out so that only important events are studied.

You can imagine that with more memory, it would be possible to store more of the data and perform a better analysis.

With more time, you could also process more data per unit time and extract more information, even without having to store anything.

pagerank

This slides mentioned that the amount of work done is actually quite vague to be quantized. One approach is to define by the execution time of doing the same work on a single processor. But this measure will have a bias to the small work. Because of the constraint of memory, the execution time on a single processor is usually not linear to the real work done.

Levy

Distributed computing frameworks like MapReduce and Spark are more like a time-constraint scaling. Most of the time they cannot expedite a fixed-size problem by a large scale, but when the problem size becomes very large (i.e. big data), they can still complete them in a bounded amount of time, given a larger cluster.