Previous | Next --- Slide 19 of 31
Back to Lecture Thumbnails
kayvonf

Question: Assume node 1 goes down. Describe all the steps that the Spark runtime will have to undergo to reconstruct the missing partitions of the RDD ip.

ChandlerBing

It will first have to reload the blocks "15418log.txt block 2" and "15418log.txt block 3" from the disk on some other nodes. It should then check the log to identify which transformations were applied and in what sequence. It then applies this entire sequence to the reloaded blocks to regenerate the partitions of the RDD ip.

kayvonf

@ChandlerBing. Correct, but how does Spark know that it needs to reload those blocks from disk? (In other words, how do we determine that from the knowledge that we've lost all partitions of RDD ip on node 1?)

landuo

I think Spark code on the driver will keep track of the results of each worker nodes and lineage of their RDD. Once the driver detects failure on node 1, it reads the lineage of partitions of RDD on node 1 and makes another node to reconstruct RDD ip using input data and transformation steps.

Source: https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf

solovoy

Does this mean the lineage is local (and thus different) on each node?

zhiyuany

@solovoy, no. The lineage is per RDD because all partitions of same RDD have the same lineage. As to the reconstruction, partitions' location info is tracked and is used for reconstruction combining with lineage of the RDD.