Im having trouble understanding what exactly a deterministic transformation is...
@IntergalacticPeanutMaker I think one reason for both immutability and deterministic transformation is that we want the ability to replicate RDD or re-compute RDD easily to gain fault tolerance.
Deterministic -- invoking the same operation on the same input RDD will always produce the same result. That is, if X is an RDD, and T is a transformation, then:
Y = spark.T(X)
Z = spark.T(X)
Then Y is equivalent to Z.
When we apply transformations on an RDD, and we get back an RDD, we have to ensure that both of them are in the memory at the same time, right? Or do we do some sort of hot swapping of data onto durable storage if memory is insufficient?
I think the deterministic property is very crucial for fault tolerance policy.