For step 2, how do we retrain network to improve quality? Does it mean we add constraints to original network that the weights we cluster to the same category must have same value, and train the network under these constraints?

blairwaldorf

@Fantasy: Here's my guess. After pruning, we cluster links together, and then assign everything in each cluster a weight which for now is the centroid of everything clustered in that group. Then, for accuracy's sake, we go back and train with the original data using these new parameters and iteratively do some kind of EM maximisation to fine tune these centroid parameters.

MaxFlowMinCut

This slide shows us an example of how we can take advantage of compression by leveraging facts about the problem domain that we're trying to solve. In this case, we're trying to minimize the memory footprint of the weights of a network. Our knowledge of the problem domain tells us that rounding very small weights to 0 and forming clusters of close-by weights does not significantly impact correctness (and as Kayvon mentioned, can sometimes even be a form of regularization).

Another example of using compression by leveraging knowledge of a problem domain is in JPEG compression. With JPEG compression, we leverage our knowledge that human perception is less sensitive to high frequency changes in an image (i.e. very subtle changes in sharp edges) by compressing /zero-ing out the high-frequency parts of an image under a discrete courier transform.

RX

I'm curious that if we cluster weights, which means multiple connections share the same weight, can we still have those highly optimized matrix multiplication and convolution algorithms?

yimmyz

After we perform link-pruning and compression, not only are we reducing the memory footprint (can be up to 49x as in next slide), but we can also store the result as a sparse matrix and can use a GPU-based sparse matrix library like cuSPARSE in order to do matrix multiplication. I believe that it can provide huge speedup compared to use the original link w/ dense matrix representation.

For step 2, how do we retrain network to improve quality? Does it mean we add constraints to original network that the weights we cluster to the same category must have same value, and train the network under these constraints?

@Fantasy: Here's my guess. After pruning, we cluster links together, and then assign everything in each cluster a weight which for now is the centroid of everything clustered in that group. Then, for accuracy's sake, we go back and train with the original data using these new parameters and iteratively do some kind of EM maximisation to fine tune these centroid parameters.

This slide shows us an example of how we can take advantage of compression by leveraging facts about the problem domain that we're trying to solve. In this case, we're trying to minimize the memory footprint of the weights of a network. Our knowledge of the problem domain tells us that rounding very small weights to 0 and forming clusters of close-by weights does not significantly impact correctness (and as Kayvon mentioned, can sometimes even be a form of regularization).

Another example of using compression by leveraging knowledge of a problem domain is in JPEG compression. With JPEG compression, we leverage our knowledge that human perception is less sensitive to high frequency changes in an image (i.e. very subtle changes in sharp edges) by compressing /zero-ing out the high-frequency parts of an image under a discrete courier transform.

I'm curious that if we cluster weights, which means multiple connections share the same weight, can we still have those highly optimized matrix multiplication and convolution algorithms?

After we perform link-pruning and compression, not only are we reducing the memory footprint (can be up to 49x as in next slide), but we can also store the result as a sparse matrix and can use a GPU-based sparse matrix library like cuSPARSE in order to do matrix multiplication. I believe that it can provide huge speedup compared to use the original link w/ dense matrix representation.