As the network output get close to the ground truth, the loss value would be close to 0 as the division part would be close to 1. If the network result deviates a lot from the ground truth say (1.0, 0, 0, 0) for this example, the loss value would be huge as the division part is 0.
crow
Note that if the pre-softmax of the network is (1.0, 0, 0, 0), there is no division by zero, because e^0 = 1
paracon
The loss value will be high if you are very sure that something is not the correct answer.
1_1
Another example of a common error function is Mean Squared Error function
As the network output get close to the ground truth, the loss value would be close to 0 as the division part would be close to 1. If the network result deviates a lot from the ground truth say (1.0, 0, 0, 0) for this example, the loss value would be huge as the division part is 0.
Note that if the pre-softmax of the network is (1.0, 0, 0, 0), there is no division by zero, because e^0 = 1
The loss value will be high if you are very sure that something is not the correct answer.
Another example of a common error function is Mean Squared Error function