Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2017

Parallel Deep Network Training

Previous | Next --- Slide 35 of 46

Back to Lecture Thumbnails

cluo1

I am wondering if this kind of asynchronous update will make the training never converge.

crow

there are provable convergence guarantees for asynchronous sgd as well

pdp

It could so happen that the number of steps taken (time taken to reach optima) by SGD might increase given we are not considering the entire training datastep for each update but the parallelization will overcompensate the additional time for convergence.