Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2017

Previous | Next --- Slide 9 of 46

firebb

As the network output get close to the ground truth, the loss value would be close to 0 as the division part would be close to 1. If the network result deviates a lot from the ground truth say (1.0, 0, 0, 0) for this example, the loss value would be huge as the division part is 0.

crow

Note that if the pre-softmax of the network is (1.0, 0, 0, 0), there is no division by zero, because e^0 = 1

paracon

The loss value will be high if you are very sure that something is not the correct answer.

1_1

Another example of a common error function is Mean Squared Error function