A lightweight network model based on YOLOx is proposed for the problems of limited resources of transmission line UAV inspection platform, high complexity of target detection algorithm and slow inference speed. First, the lightweight ShuffleNetV2_Plus network is used as the backbone network for feature extraction, and the Depthwise Convolution (DWConv) in the ShuffleNetV2 network is expanded by replacing 3×3DWConv in the ShuffleUnit module with 5×5DWConv in the ShuffleUnit module, and prune the convolution layer of the model, and prune the 1×1Pointwise Convolution (PWConv) in the ShuffleUnit basic unit module to reduce the network parameters while increasing the network perceptual field. At the same time, add the Efficient Channel Attention (ECA) module in the neck feature fusion part to make the network better focus on important regions and improve the target detection accuracy at a small computational cost. Finally, the ordinary convolution in the YOLOx detection decoupling head is replaced with Depthwise Separable Convolution (DSConv) to further reduce the model complexity. The results show that the inference time of the lightweight network model proposed in this paper is only 5.8ms, the model parameters are only 4.361MB, and the FLOPs are only 10.725G, and the detection accuracy is high on the combined self-built transmission line dataset.
Natural scene text recognition is one of the most challenging tasks in recent years. Compared with traditional document text, natural scene text has the characteristics of various shapes and different directions, so the accuracy of scene text recognition still needs to be improved. In order to locate the text region better and identify the text content more accurate, we present a multi-scale deformable convolution network model for text recognition. The initial image is irregularly corrected through the rectified network, and the ResNet with FPN structure is used as the backbone network to achieve multi-scale feature extraction. In addition, the feature fusion method of Add is adopted to reduce feature information losing and increase the strength of feature extraction in the text area. The deformable convolution block is introduced in the deep convolution to improve the deformation modeling ability of convolution and expand the receptive field. The prediction module adopts the Transformer and abandons the inherent pre and post attributes of RNN to realize parallel operation and solve the problem of path length between remote dependencies. In order to evaluate the effectiveness of the proposed method, we trained our model on two mixed data sets, MJSynth and SynthText, and tested it on some regular and irregular data sets. The experiment results demonstrate that this method performs well in irregular scene text recognition, especially in CUTE80.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.