Previous | Next --- Slide 60 of 60
Back to Lecture Thumbnails
username

As an answer to the second question --> having a high level of abstraction may prevent the most efficient execution by the system since there can be some small detailed changes a user can make to take advantage of specifics of the program or architecture that may otherwise be lost to a high level of abstraction

POTUS

Regarding the benefits of a high level of abstraction, an intricate knowledge of how the system works isn't required for someone looking to achieve a minimum standard of performance or efficiency

nemo

One big benefit of data-parallel systems is parallel processing on a collection of input data (if same operations have to be applied to it, independent of one another) making the computation much faster. In the case of multiple mappings applied on the same data one after the other, arithmetic intensity increases, which can give a significant speedup.

lya

MapReduce adopts message passing as the its communication model. There is a single master node and several worker nodes in MapReduce. Master node is responsible for assigning and scheduling tasks to worker nodes, while worker nodes process a partition of data in each step. The first step is Map, where data are filtered and transformed in some way, and sorted according to the keys. In the second step Reduce, workers read those intermediate results and usually perform some summary operations.

sampathchanda

@Iya, I think MapReduce is a shared address space model, since the data is not sent across nodes in form of packets of data, but addresses are communicated through master. Address sent by one node through master is accessed by another node, indicating a shared address space model. Please correct me, if I am wrong.

jedi

@sampathchanda, I believe you may be mistaken. The Map tasks send key-value data to the Reducers, typically partitioned by the hash of the key. The Reducers aggregate and sort this data while the Map tasks are running and begin processing when all Map tasks have sent all their output data.