Previous | Next --- Slide 13 of 43
Back to Lecture Thumbnails
muchanon

The semantics of the reducer requires that values is the list of all values to be reduced. This means that we cannot precompute partially reduced values. We must gather the set of all keys so that they can be reduced as a complete set.

o__o

@muchanon, To clarify, do you mean that we need to send over all of the Safari iOS values 1 from node 1, for instance, instead of precomputing the sum and sending over the partial sum value? Wouldn't the problem quickly become bandwidth bound?

shhhh

I believe the point of having the data over multiple nodes is due to the fact that this information cannot fit on a single node. Therefore, it is infeasible to gather all the information on one node and you must compute the partially reduced values.

paramecinm

@o__o Usually we will use combiner to combine same key locally before reduce. That can alleviate the bandwidth pressure.

crow

We mentioned possibly using interconnects to alleviate bandwidth concerns. In fact, using an interconnect would make disk IO the bottleneck. However, interconnects are expensive. At least searching for "cpu interconnect" on ebay, it seems like the price range is around $1000 for a cable.

SR_94

@paramecinm @shhhh, I think we cannot run reduce on local values before sending over to the master node because reduce requires that all values be sent to it before we start the reduce job. Here although we know it will work if we pre-compute the sum locally and send it over because the reduce happens to be associative but actually we are not supposed to know anything about the reduce job except that the node responsible for reducing must have all the values sent over before starting the reduce job.

jk2d

@SR_94 Agree. In fact, the professor mentioned this in lecture.

kayvonf

@jk2d, @sr_94. That is correct. Also see the more detailed conversation about this on the next slide.