Multiscale object detection is a challenging task due to the multiscale and multiclassification nature of different objects. Convolutional neural networks are commonly used to extract the features. However, continuous downsampling operations and spatial position quantization result in the loss of detailed information. Subsequently, the high-level feature maps cannot distinguish the features of adjacent objects, and in particular, small objects are ignored. To solve the above problems, we propose a model for multiscale object detection. First, we design a dual-bottleneck subconvolutional network to extract shallow features from the multiscale image pyramid. The subnetwork provides more spatial information for feature maps of the backbone network. Then, we introduce a new downsampling method to avoid losing a large amount of detailed information during the downsampling operations. Finally, we use a loss function to alleviate the foreground–background class imbalance. In the ablation study, we research the matching between the image pyramid module and the backbone network, and we compare different feature fusion modules. We evaluate our proposed model on two publicly available large-scale datasets and observe that our model achieves an 8.5% improvement compared with the classical feature pyramid network and exhibits acceptable detection performance for the occluded and diverse objects. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
CITATIONS
Cited by 2 scholarly publications.
Convolution
Image fusion
Data modeling
Performance modeling
Detection and tracking algorithms
Feature extraction
Image resolution