In the natural driving state, the autonomous vehicle needs to make its driving decision, which requires precise traffic sign recognition with different distances and angles according to road traffic rules. However, the change in the observation points of autonomous cameras causes traffic signs to exhibit visual characteristics of variable size and complex backgrounds. The coupling of multiple characteristics poses severe challenges for traffic sign identification and positioning. In this paper, a multi-scale feature extraction and coordinate network based on the YOLO framework (MFEC-YOLOv7) is proposed to achieve accurate traffic sign recognition in complex situations. In more detail, the feature for traffic signs at different scales is extracted and enhanced by fusing the pooling layer and group convolution in the convolutional architecture of the backbone network, thus improving the multi-scale feature perception capability. Furthermore, the coordinate attention (CA) module is introduced to pay attention to traffic signs under multiple interference factors. And the collaborative working mechanism of the bidirectional feature pyramid is brought in the neck network, and combined with the CA module to accurately locate the position information of features, to enhance the feature extraction ability of the model. Besides, depth separable convolution is adopted to significantly reduce the number of convolution parameters and improve the efficiency without reducing the effectiveness of feature extraction. The experimental results on the public data set TT100K and the self-built data set show that compared with the YOLOv7 method, MFEC-YOLOv7 improves the detection accuracy by 19.9%, reduces the amount of parameter calculation by 9.5%, and improves the speed by about 16.9%. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
Convolution
Object detection
Feature extraction
Target detection
Neck
Performance modeling
Feature fusion