Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

In-Memory Distributed Computing using Spark

Previous | Next --- Slide 16 of 44

fleventyfive

"This is possible since inputs reside in persistent storage (distributed file system)" --> Does this line assume that the whole data set is resident on each node, and that informing a node about the tasks it has to do to make up for the failed node is sufficient?

Araina

each node does not store the whole data. Spark uses the official API of HDFS. In HDFS, data are stored mainly on data nodes by chunks and each chunk has some certain amount of copies on different nodes.

Perpendicular

Moving computation to the data has energy efficiency benefits (which may be a huge concern in big data-centers )besides reducing total time of the computation.