Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2017

In-Memory Distributed Computing using Spark

Previous | Next --- Slide 43 of 43

dyzz

Spark is a very powerful tool to build on top of because it provides very robust methods of parallelism that can be very powerful for its use cases. I wonder on which workloads these spark versions of libraries perform better than their non-spark counterparts. Specifically, I wonder if they still perform better/the same for smaller datasets that dont necessarily have to be streamed from disk.