This slide gives the types of many common RDD operations, but it doesn't describe the semantics of the operations. (This list is the set of transformations and actions described in the original RDD publication.)

Here is also a webpage which gives some specific introductions and examples on RDD transformations.

MaxFlowMinCut

Do some of these operators assume a given data type in the RDD? For example, does cross_product assume that the RDDs contain n-dimensional vectors?

jsunseri

@MaxFlowMinCut I think the definition for crossProduct given above is actually a Cartesian product, which you access in Spark by using cartesian(). The cross product as we commonly use it (in physics, for example) is not defined for an arbitrary number of dimensions (see Wedge Product for the thing that is defined in an arbitrary number of dimensions), so an implementation would probably assume three-dimensional vectors.

This slide gives the types of many common RDD operations, but it doesn't describe the semantics of the operations. (This list is the set of transformations and actions described in the original RDD publication.)

You can see the documentation of the current Spark implementation here: http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD

This looks very similar to SML.

Here is also a webpage which gives some specific introductions and examples on RDD transformations.

Do some of these operators assume a given data type in the RDD? For example, does

`cross_product`

assume that the RDDs contain n-dimensional vectors?@MaxFlowMinCut I think the definition for crossProduct given above is actually a Cartesian product, which you access in Spark by using cartesian(). The cross product as we commonly use it (in physics, for example) is not defined for an arbitrary number of dimensions (see Wedge Product for the thing that

isdefined in an arbitrary number of dimensions), so an implementation would probably assume three-dimensional vectors.