Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2017

In-Memory Distributed Computing using Spark

Previous | Next --- Slide 29 of 43

ggm8

There was a discussion in class about loading more than 1 line from the file at a time. I was wondering when it would be a good idea to do this and control the granularity of how many lines you read, instead of just depending on the size of the cache line and letting the cache handle this behavior?

machine6

Spark takes advantage of the narrow dependencies each step in the computation has with the intermediary data. This data is computed 'virtually' in the sense that it appears to the user that it has been computed and stored in some intermediary form as a whole, but in reality the implementation cleverly performs all operations on a single 'line' at once.