SGD for many neural networks work in an asynchronous way, i.e. there's not such a synchronized reduction for each sum. Instead, they simply hand it over parameter server & update it.
mak
What criteria is used to decide "loss too high"?
Is it based on domain specific / application knowledge or empirical value? Or is it specified as requirement?
themj
Generally, you check if the loss is too high by computing the difference between the current solution and the desired solution. If this difference is above a predetermined threshold, then the loss is considered too high.
SGD for many neural networks work in an asynchronous way, i.e. there's not such a synchronized reduction for each sum. Instead, they simply hand it over parameter server & update it.
What criteria is used to decide "loss too high"? Is it based on domain specific / application knowledge or empirical value? Or is it specified as requirement?
Generally, you check if the loss is too high by computing the difference between the current solution and the desired solution. If this difference is above a predetermined threshold, then the loss is considered too high.