TY - GEN
T1 - Compressing Deep Convolutional Neural Networks by Stacking Low-dimensional Binary Convolution Filters
AU - Lan, Weichao
AU - Lan, Liang
PY - 2021/5/18
Y1 - 2021/5/18
N2 - Deep Convolutional Neural Networks (CNN) have been successfully applied to many real-life problems. However, the huge memory cost of deep CNN models poses a great challenge of deploying them on memory-constrained devices (e.g., mobile phones). One popular way to reduce the memory cost of deep CNN model is to train binary CNN where the weights in convolution filters are either 1 or −1 and therefore each weight can be efficiently stored using a single bit. However, the compression ratio of existing binary CNN models is upper bounded by around 32. To address this limitation, we propose a novel method to compress deep CNN model by stacking low-dimensional binary convolution filters. Our proposed method approximates a standard convolution filter by selecting and stacking filters from a set of low-dimensional binary convolution filters. This set of low-dimensional binary convolution filters is shared across all filters for a given convolution layer. Therefore, our method will achieve much larger compression ratio than binary CNN models. In order to train our proposed model, we have theoretically shown that our proposed model is equivalent to select and stack intermediate feature maps generated by low-dimensional binary filters. Therefore, our proposed model can be efficiently trained using the split-transform-merge strategy. We also provide detailed analysis of the memory and computation cost of our model in model inference. We compared the proposed method with other five popular model compression techniques on three benchmark datasets. Our experimental results have clearly demonstrated that our proposed method achieves much higher compression ratio of 103.2 times on VGG-16 network than existing methods while maintains comparable accuracy.
AB - Deep Convolutional Neural Networks (CNN) have been successfully applied to many real-life problems. However, the huge memory cost of deep CNN models poses a great challenge of deploying them on memory-constrained devices (e.g., mobile phones). One popular way to reduce the memory cost of deep CNN model is to train binary CNN where the weights in convolution filters are either 1 or −1 and therefore each weight can be efficiently stored using a single bit. However, the compression ratio of existing binary CNN models is upper bounded by around 32. To address this limitation, we propose a novel method to compress deep CNN model by stacking low-dimensional binary convolution filters. Our proposed method approximates a standard convolution filter by selecting and stacking filters from a set of low-dimensional binary convolution filters. This set of low-dimensional binary convolution filters is shared across all filters for a given convolution layer. Therefore, our method will achieve much larger compression ratio than binary CNN models. In order to train our proposed model, we have theoretically shown that our proposed model is equivalent to select and stack intermediate feature maps generated by low-dimensional binary filters. Therefore, our proposed model can be efficiently trained using the split-transform-merge strategy. We also provide detailed analysis of the memory and computation cost of our model in model inference. We compared the proposed method with other five popular model compression techniques on three benchmark datasets. Our experimental results have clearly demonstrated that our proposed method achieves much higher compression ratio of 103.2 times on VGG-16 network than existing methods while maintains comparable accuracy.
UR - https://ojs.aaai.org/index.php/AAAI/article/view/17002
U2 - 10.1609/aaai.v35i9.17002
DO - 10.1609/aaai.v35i9.17002
M3 - Conference proceeding
SN - 9781577358664
T3 - Proceedings of the AAAI Conference on Artificial Intelligence
SP - 8235
EP - 8242
BT - 35th AAAI Conference on Artificial Intelligence, AAAI 2021
PB - AAAI press
ER -