Previous | Next --- Slide 7 of 31
Back to Lecture Thumbnails
dsaksena

I had a confusion,

can't we call reducer on node 0 first for Safari iOS values 0, then Chrome Glass values 0 , then chrome values 0

similarly each node calls reducer on values local to it and we finally can combine all the reducers thus maintaining the locality.

But on further discussion I found the keyword as to why we didn't do this is "combine", we are given only a reducer and no combiner and reducer intermediate values cannot be combined without a combiner, with these constraints, the solution in above slide seems great.

kayvonf

@daksena. This would violate the reducer semantics that I specified when I introduced mapReduceJob on slide 3. The semantics of the reducer as that is receives a key and a list of all values associated with that key. The runtime system doesn't have an visibility into how the application problem implemented the reducer, so it's not safe to assume that your optimization is valid.

Your optimization is in fact the Hadoop combiner. http://www.philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4-to-use-or-not-to-use-a-combiner/

Kapteyn

Given that all our reducer is doing is counting the number of instances of a given key, in this case it seems kind of overkill to send intermediate data to the node responsible for that key.

Instead of incurring the communication costs of sending the key value pair to the appropriate node or storing it into the file system for a given node to read later, each node can just count the number of instances of each key as it does the mapping. After running map, all nodes can send their counts to a head node that can run a much smaller reduce on the key counts across nodes to obtain the desired values.

kayvonf

@kapteyn. You are correct in proposing an alternative implementation if the system knew the logic of the reducer was to count the tuples. However, the system does not know the implementation of the reducer since that is a function provided by the application. To implement your optimization the system would need to provide additional interfaces for an application to tell the system about the behavior of the reducer.