Previous | Next --- Slide 16 of 44
Back to Lecture Thumbnails

"This is possible since inputs reside in persistent storage (distributed file system)" --> Does this line assume that the whole data set is resident on each node, and that informing a node about the tasks it has to do to make up for the failed node is sufficient?


each node does not store the whole data. Spark uses the official API of HDFS. In HDFS, data are stored mainly on data nodes by chunks and each chunk has some certain amount of copies on different nodes.


Moving computation to the data has energy efficiency benefits (which may be a huge concern in big data-centers )besides reducing total time of the computation.