Most traditional loop closure detection (LCD) methods rely on manual feature design, which is sensitive to environmental conditions. Convolutional neural networks (CNNs) cope better with illumination changes by extracting hierarchical features and ignoring the local spatial characteristics of images. We propose an LCD algorithm that combines VGG16, NetVLAD, and image pyramids to enhance its accuracy and robustness. In particular, a three-level image pyramid was constructed via downsampling, and then a feature pyramid (FP) layer was obtained by extracting features through VGG16 network on different image resolutions. The obtained FPs were then passed into the VLAD model, and this model outputted VLAD vectors by performing residual summation with L2 normalization. Finally, a triplet loss function was employed for training. Experimental results on two benchmark datasets and a real scenario dataset demonstrated that this algorithm outperforms the NetVLAD baseline and the VGG16 network, exhibiting superior feature-learning capabilities and achieving a higher LCD accuracy. Further, it maintained real-time performance with only a 2% increase in processing time. The results indicate that the proposed method detects loop closures even in complex environments with varying conditions and perspectives. Hence, the approach can be used for large-scale visual simultaneous localization and mapping applications, such as autonomous driving, where LCD plays a crucial role in mapping. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
Education and training
Liquid crystal displays
Performance modeling
Feature extraction
Visualization
Data modeling
Image resolution