To reduce the error for this training example, we could decrease p1 by 5, or increase p2 by 2.
holard
Doesn't this depend on the convexity of the function? Or do we just accept that this approach may end up getting stuck in local optima?
crow
there is evidence to suggest that neural networks do not fail to train because of nonconvexity, but because of other issues, such as poorly conditioned gradients. although the loss function often has trillions of local optima, the stochasticity of the training algorithm usually means that networks will not get stuc in them
dyzz
@holard it is true that you are only guaranteed to converge to a global optima with a convex function, however it has been shown in practice in neural nets that often converging to a local optima is "good enough".
To reduce the error for this training example, we could decrease p1 by 5, or increase p2 by 2.
Doesn't this depend on the convexity of the function? Or do we just accept that this approach may end up getting stuck in local optima?
there is evidence to suggest that neural networks do not fail to train because of nonconvexity, but because of other issues, such as poorly conditioned gradients. although the loss function often has trillions of local optima, the stochasticity of the training algorithm usually means that networks will not get stuc in them
@holard it is true that you are only guaranteed to converge to a global optima with a convex function, however it has been shown in practice in neural nets that often converging to a local optima is "good enough".