Previous | Next --- Slide 35 of 46
Back to Lecture Thumbnails
cluo1

I am wondering if this kind of asynchronous update will make the training never converge.

crow

there are provable convergence guarantees for asynchronous sgd as well

pdp

It could so happen that the number of steps taken (time taken to reach optima) by SGD might increase given we are not considering the entire training datastep for each update but the parallelization will overcompensate the additional time for convergence.