Previous | Next --- Slide 10 of 43
Back to Lecture Thumbnails
mak

Which process(and at what stage) does the gathering of all values associated with a key and finding total no. of unique keys? Is it the case that after mapper finishes its computation, it does gathering?

rsvaidya

@mak Yes after the mapper completes it's job, internally the results are grouped according to the user_agent and then the reducer is called to find the count of each unique user_agent. This is done internally and is not generally the headache of the user. The user only writes the mapper and reducer and the rest is taken care by the framework.

yes

@mak The programming abstraction used here is implemented as RDDs.

POTUS

@yes I'm pretty sure it's not RDDs, its a distributed system like HDFS mentioned 2 slides before this. RDDs are a data structure that exist exclusively in Spark.