Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

Parallel Deep Network Training

Previous | Next --- Slide 37 of 44

Back to Lecture Thumbnails

jellybean

In this case we can partition the parameters so that they are split among multiple nodes.

jsunseri

To distinguish the last case from this one: in the last few slides, we discussed how parameter values could be sharded across multiple servers to reduce the adverse effects of contention on latency - all workers must communicate with the parameter servers, both to update their own parameter values and to update the parameter values on the servers. Those communication costs become greater if there are more parameters to send back and forth, but we can reduce individual transfer times (and effectively pipeline the transferring of the full set of parameters) by having multiple parameter servers with disjoint subsets of the parameters stored on each. Now we consider what happens if the number of parameters becomes so great that it becomes impractical for a worker node to work on all of them at once - each worker now may work on a subset of the parameters at a time. This would also reduce contention for the parameters servers, since only a subset of the worker nodes would need to communicate with each one.