Slide View : 15-418/618 Spring 2014

Programming for Performance: Work Distribution

Previous | Next --- Slide 13 of 46

Back to Lecture Thumbnails

shabnam

In which category would we put frameworks like Hadoop?

This comment was marked helpful 0 times.

mchoquet

I think of Hadoop as closest to a data-parallel model: you specify some map and reduce operation that should be applied to all of the data in your dataset, and the framework handles the scheduling and communication.

This comment was marked helpful 0 times.

kayvonf

I agree.

This comment was marked helpful 0 times.

yihuaf

In addition, Hadoop is designed to run on a cluster of machines that does not share any disk or memory space. This design choice allows Hadoop to run on out of box commodity hardwares. Conceptually, Hadoop's abstraction is closer to data-parallel model, but Hadoop is implemented a little differently from the data-parallel described here. If you wish to explore, you can read about how MapReduce is implemented in Google's paper.

This comment was marked helpful 0 times.