note that the first fully connected layer in VGG is very expensive in terms of memory. there are 7x7x512x4096 weights on that layer.
a recent paper showed that something like 97% of those weights could be removed (set to 0), without reducing the accuracy of the network. this could lead to much faster performance on bandwidth bound machines
rsvaidya
How do we get the value 392 MB for weights memory for fully connected?
Is it something like 7x7x512(earlier layer input) * 4096 * 4 bytes = 411 MB aprx ? What am I taking extra?
note that the first fully connected layer in VGG is very expensive in terms of memory. there are 7x7x512x4096 weights on that layer.
a recent paper showed that something like 97% of those weights could be removed (set to 0), without reducing the accuracy of the network. this could lead to much faster performance on bandwidth bound machines
How do we get the value 392 MB for weights memory for fully connected?
Is it something like 7x7x512(earlier layer input) * 4096 * 4 bytes = 411 MB aprx ? What am I taking extra?
411*1000*1000 bytes ~= 392*1024*1024 bytes
Thanks @crow