Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

Previous | Next --- Slide 37 of 41

kayvonf

This table is from Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization, and Huffman Coding by Han et al. (ICLR16).

You might also be interested in:

Learning both Weights and Connections for Efficient Neural Networks by Han et al.
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size by Iandola et al.
EIE: Efficient Inference Engine on Compressed Deep Neural Network by Han et al.

In order to preserve accuracy, after removing connections the network needs to be retrained. Compress a trained network will actually require another multiple rounds of training, that may take a long time and require customization to the training process (sparse connections, weight encoding). I'm thinking if we can directly compress the trained model statically, meaning that no need to retrain at all, while preserve as much precision of the original network as possible

aeu

If we can achieve about the same accuracy from a compressed network, why don't we adapt this approach for all kinds of networks? Why don't we train networks and then compress them once they are trained? Maybe someone more knowledgeable than I can answer this.

@aeu I think it takes time to retain the model, and it is tricky to do so as the model is represented in a different way now