Previous | Next --- Slide 39 of 69
Back to Lecture Thumbnails
adilets

So if I understand this correctly you want to use gradient descent on the loss function so that loss approaches 0.

EggyLv999

Yep. It won't actually go to 0 because your model probably isn't expressive enough to actually get the loss function to 0, and because gradient descent finds a local minimum, not a global minimum. In practice, we use stochastic gradient descent, which does one or a couple training examples at a time instead of all of them, instead of batch gradient descent, which will always go to the minimum of the loss function on the training examples but is much slower.

MangoSister

Is it true that gradient descent may get stuck on local minima, and is there any strategy to avoid that (does stochastic gradient descent mentioned by EggyLv999 help)?

yey1

@MangoSister For a very high dimensional neural network, local minima is not a very serious problem. The real problem is saddle point. Around saddle point it is a flat region and you can not easily escape with SGD.

Check these two papers if you have interest:

[1] On the Saddle Point Problem for Non-convex Optimization

[2] Identifying and Attacking the Saddle Point Problem in High-dimensional Non-convex Optimization

To avoid saddle point, you can use some techniques such as momentum.

ferozenaina

@MangoSisters - Stochastic Gradient Descent (SGD) would prevent the algorithm from getting stuck on local minima. It behaves analogously to the Simulated annealing in our assignments. We introduce some randomness in the sampling of points to help us reach the best minima. The drawback is that the parameters need to be tuned to get the correct amount of 'noisiness'. This is also the first time I've heard about the saddle points being the main problem for gradient descent.

carnegieigenrac

Why can't a similar technique be used to escape saddle points as there is for escaping local minima? Couldn't a small amount of randomness cause the current solution to be pushed down either of the saddle's slopes?