If the number of training samples are very large, then using gradient descent may take too long because in every iteration when you are updating the values of the parameters, you are running through the complete training set. On the other hand, using stochastic gradient descent will be faster because you use only one training sample and it starts improving itself right away from the first sample.
If the number of training samples are very large, then using gradient descent may take too long because in every iteration when you are updating the values of the parameters, you are running through the complete training set. On the other hand, using stochastic gradient descent will be faster because you use only one training sample and it starts improving itself right away from the first sample.