Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2017

Parallel Deep Network Training

Previous | Next --- Slide 11 of 46

Back to Lecture Thumbnails

haoala

To reduce the error for this training example, we could decrease p1 by 5, or increase p2 by 2.

holard

Doesn't this depend on the convexity of the function? Or do we just accept that this approach may end up getting stuck in local optima?

crow

there is evidence to suggest that neural networks do not fail to train because of nonconvexity, but because of other issues, such as poorly conditioned gradients. although the loss function often has trillions of local optima, the stochasticity of the training algorithm usually means that networks will not get stuc in them

dyzz

@holard it is true that you are only guaranteed to converge to a global optima with a convex function, however it has been shown in practice in neural nets that often converging to a local optima is "good enough".