Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

In-Memory Distributed Computing using Spark

Previous | Next --- Slide 42 of 44

Renegade

Apparently, Spark did a great job as a huge improvement upon MapReduce. However, for ML training tasks through Big data, Spark doesn't perform as well as other frameworks which introduce asynchrony. The reason behind is: Bulk Synchronous Parallel (BSP) model, which is adopted by such MapReduce-like systems as Hadoop and Spark, is not desirable any more for lack of speed, since workers must wait for stragglers at each iteration.

sharangc

This is a great article talking about Hadoop and Spark. It talks about how they compare in ease of use, speed, combining SQL, streaming and complex analytics. Finally, it also talks about MapReduce from the perspective of both Spark and Hadoop.