Previous | Next --- Slide 12 of 46
Back to Lecture Thumbnails
pht

If the number of training samples are very large, then using gradient descent may take too long because in every iteration when you are updating the values of the parameters, you are running through the complete training set. On the other hand, using stochastic gradient descent will be faster because you use only one training sample and it starts improving itself right away from the first sample.