Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

In-Memory Distributed Computing using Spark

Previous | Next --- Slide 21 of 44

IntergalacticPeanutMaker

Im having trouble understanding what exactly a deterministic transformation is...

@IntergalacticPeanutMaker I think one reason for both immutability and deterministic transformation is that we want the ability to replicate RDD or re-compute RDD easily to gain fault tolerance.

kayvonf

Deterministic -- invoking the same operation on the same input RDD will always produce the same result. That is, if X is an RDD, and T is a transformation, then:

Y = spark.T(X)
Z = spark.T(X)

Then Y is equivalent to Z.

fleventyfive

When we apply transformations on an RDD, and we get back an RDD, we have to ensure that both of them are in the memory at the same time, right? Or do we do some sort of hot swapping of data onto durable storage if memory is insufficient?

yikesaiting

I think the deterministic property is very crucial for fault tolerance policy.