Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2017

Parallel Deep Network Training

Previous | Next --- Slide 14 of 46

Back to Lecture Thumbnails

chenh1

What if in the max function, we change 12 to 17? Now it will contribute to output.

bochet

Something interesting is if x and y output the same value, then gradient of max should be propagated back to both gates.

williamx

@chenh1 True, this could be an issue. But when we do gradient descent, we usually use step sizes that are small enough so that such a scenario wont occur frequently.

Master

@bochet. In max pooling, if x and y happen to have the same value, only one of their gradients will be propagated, in a deterministic way. However, it seldom happens due to small step sizes.

POTUS

How the blue numbers were obtained:

+ propagates the derivative backwards on the link
max() switches the gradients to one of the incoming links
* takes incoming link and multiplies it by the value on the opposite link