Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2017

Previous | Next --- Slide 21 of 46

tarabyte

Unlike during network evaluation, you must hold onto all of these outputs until back propagation comes back with the network. This is why there is a limit to the size of the network you can train.

BigPapaChu

Right now its a 3d representation of a training grid (like 3x3x384) are there possibly 4d representations of neural nets like (3x3x3x384)? Or is that pointless?

crow

@bigpapachu yes, this comes into play when working with video data or 3d data (such as CT scans)

rsvaidya

If you are using momentum and velocity to calculate the changing gradients you would need to hold on to those as well while training.