Convolutional Neural Networks (CNNs) have demonstrated great success in image recognition, but most trained models are over-parameterized, and models can be compressed with only a slight performance degradation. Pruning is one of the lightweight techniques of networks, which obtains a model with a lower computational cost of inference by removing filters selectively that do not contribute to the performance. While various methods have been proposed to identify unimportant filters, determining the number of filters to be removed at each layer without causing a significant loss of accuracy is an open problem. This paper proposes a “filter duplication” approach to reduce the accuracy degradation caused by pruning, especially in higher compression ratio ranges. Filter duplication replaces unimportant filters with critical filters in a pre-trained model based on the measured importance of each convolutional layer before pruning. In experiments using mainstream CNN models and datasets, we confirmed that filter duplication improves the accuracy of the pruned model, especially with higher compression ratios. In addition, the proposed method can reflect the structural redundancy of the network to the compression ratio of each layer, providing a more efficient compression. The results show that duplicating an appropriate number of critical filters for each layer improves the robustness of the network against pruning, and optimization of duplication methods is desirable.
|