Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2017

Previous | Next --- Slide 12 of 46

pht

If the number of training samples are very large, then using gradient descent may take too long because in every iteration when you are updating the values of the parameters, you are running through the complete training set. On the other hand, using stochastic gradient descent will be faster because you use only one training sample and it starts improving itself right away from the first sample.