In lecture, Kayvon mentioned how most practical solutions take an approximate solution by "wandering" in the right direction as opposed to taking the best step each time. How would this tie in with the code on the previous slide? i.e. what kind of changes would you make to that pseudocode?
@Calloc, the change is that we don't do barrier and update the parameters asynchronously(using parameter server). More detailed information can be found in later slides and this paper.
Two important properties of machine learning programs (including DNN) make the asynchronous solution feasible: error tolerance and non-uniform convergence. Error tolerance is mentioned during the class by Kayvon, that the parameters do not needed to be perfectly updated synchronously and the program will still finally achieve minimum, either global or local. Non-uniform convergence indicates that parameters can converge at vastly different iteration numbers. This may lead to work imbalance during runtime, making bulk synchronization inefficient.
How do we know if having a barrier and getting more correct parameter values is going to be less efficient than the asynchronous execution we're talking about where we use "good enough" parameter values and eventually get the correct weights?
@maxdecmeridius One way is to obviously just try, but that might not always be possible considering time constraints and resources. At that point, the best thing to do is approximate, based on educated examinations about the parameters.