Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2017

The Future of High-Performance Computing

Previous | Next --- Slide 23 of 51

holard

The last bullet point is what allows Google to use relatively cheap hardware in their data centers. The cost of hardware failure is significantly lower than in the supercomputing case, so it is more efficient to simply use cheaper hardware in larger quantities.

Firephinx

If anyone is curious, here is a video about how Google protects and discards hard drives: https://www.youtube.com/watch?v=cLory3qLoY8

Levy

Since tasks in MapReduce are independent, there's no need for periodical checkpointing and restore from failure. On contrary, tasks on each node in HPC are highly dependent (e.g. they use bulk synchronous model and move forward in lock-step), so they need checkpointing to tolerate fail.