Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

Previous | Next --- Slide 34 of 44

misaka-10032

Another solution is to use the stale parameters. As long as we use a proper staleness parameter, it's able to converge faster thanks to the reduced synchronization cost.

418_touhenying

The impact of convergence of SGD seems like race conditions, but maybe those are not so important because machine learning allows certain errors?

chuangxuean

This approach seems like it will tolerate a large amount of errors due to each cluster computing on very different set of data. Is that something that we allow for?

@418_touhenying @chuangxuean, this is allowed, it will still most likely to converge even the parameter server applies subgradients on a state model to a fresher model. Actually there are three typical kinds of synchronization protocol, namely bulk synchronization, asynchronize and bounded delay synchronization. You should refer to this paper for more information on a detailed discussion of communications of parameter servers.