Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

In-Memory Distributed Computing using Spark

Previous | Next --- Slide 26 of 44

msfernan

I had a doubt about the abstraction and implementation for RDD's. Professor Kayvon explained it to me. I thought his reply was helpful. Here it is below.

"The main idea of an RDD is that it is an abstract collection (a sequence). That abstraction can be implemented by being backed by memory (like an array) or the elements can be computed on demand as needed. So I would say that the RDD abstraction allows us to execute a full program -- consisting of a sequence of operations on RDDs -- without every having to materialize all of the RDDS (or even all parts of a single RDD in memory at once). "- Kayvon Fatahalian

pkoenig10

This comment is super helpful. It clearly highlights the benefits of RDDs. So are there a variety of RDD implementation? Maybe even multiple implementations used side-by-side that are chosen from depending on the sequence of operations for further optimization?