What if in the max function, we change 12 to 17? Now it will contribute to output.
bochet
Something interesting is if x and y output the same value, then gradient of max should be propagated back to both gates.
williamx
@chenh1 True, this could be an issue. But when we do gradient descent, we usually use step sizes that are small enough so that such a scenario wont occur frequently.
Master
@bochet. In max pooling, if x and y happen to have the same value, only one of their gradients will be propagated, in a deterministic way. However, it seldom happens due to small step sizes.
POTUS
How the blue numbers were obtained:
+ propagates the derivative backwards on the link
max() switches the gradients to one of the incoming links
* takes incoming link and multiplies it by the value on the opposite link
What if in the max function, we change 12 to 17? Now it will contribute to output.
Something interesting is if x and y output the same value, then gradient of max should be propagated back to both gates.
@chenh1 True, this could be an issue. But when we do gradient descent, we usually use step sizes that are small enough so that such a scenario wont occur frequently.
@bochet. In max pooling, if x and y happen to have the same value, only one of their gradients will be propagated, in a deterministic way. However, it seldom happens due to small step sizes.
How the blue numbers were obtained:
+ propagates the derivative backwards on the link
max() switches the gradients to one of the incoming links
* takes incoming link and multiplies it by the value on the opposite link