Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2017

In-Memory Distributed Computing using Spark

Previous | Next --- Slide 20 of 43

Back to Lecture Thumbnails

pht

How does it determine how often and periodically the state of the program is stored?

paramecinm

The step of saving checkpoints can run in parallel while in mapreduce, the store to and load from hdfs can only run sequentially because each iteration's jobs depend on the data on hdfs.

chenboy

Can we apply a fine-grained partitioning method to store the checkpoint to allow multiple machine participate in the recovery process. I think this solves the workload imbalance problem when all nodes have to wait for a single node to redo the computation before they can proceed.

atadkase

The stores to disk in Hadoop forms a type of implicit checkpoints. However, in Spark, these have to be done intentionally as all the computations are done in memory.

Firephinx

Yes you can use .persist() or .save() to trigger intentional saving of the RDD to memory. In addition, RDDs that have wide dependencies on them seem to be automatically saved because if there is a failure with them, a lot of other RDDs would need to wait for it to be recomputed before they could continue, unlike a RDD with narrow dependencies.