Proceedings Article | 4 December 2024
Jingcheng Shi, Jiaxing Li, Zhiwei Jiang, Shu Zhang, Ningning Song, Shiguo Chen, Liping Wei
KEYWORDS: Object detection, Infrared radiation, Infrared imaging, Small targets, Infrared detectors, Target detection, Transformers, Thermal modeling, Image processing, Education and training
Infrared small target detection is crucial in various applications, such as nighttime vessel inspection, disaster warning systems, and others. Compared with traditional object detection tasks, infrared scenes present distinctive challenges. First of all, due to the target’s distance or shape, the proportion of the target is exceedingly small and the information of the background is complex. Generally, the target comprises a few pixels in extreme cases. Second, the targets in infrared detection tasks are typically sparsely distributed and low contrast contains only a few instances, each of object occupies a minuscule portion of the entire infrared image. Meanwhile, the processing system has high requirements for real-time and multi-frame performance is difficult to achieve. In order to solve the above problems, we proposed a novel Detection Transformer network (DETR) for infrared images, which integrates the efficient transformer and a novel scale sequence attention mechanism (ETSS-Net). The contributions of this work can be summarized as follows: (1) with the success of Transformers in computer vision tasks, recent studies try to optimize the complexity of Transformers in the detection tasks. However, the variants of the Transformers still have higher considerably parameters than some lightweight convolutional neural networks. According to this idea, we designed a self-attention Transformer block, which we called Frequency-based Intrascale Feature Interaction (FIFI). It is inspired by the interaction and expression of frequency information between image pixels. (2) Second, we propose a plug-and-play dimension scale selection module (DSSM) to maintain the balance between detection speed, effect, and the number of parameters and to make the proposed SLAM improve the performance of detection model, and it can simultaneously incorporate both spatial and channel information in the training processing, as well as local and global information. (3) The proposed ETSS-Net improved the detection performance of infrared small targets. The feature learning ability of this model can be well enhanced by the designed backbone Transformer and attention mechanism. Experiments on numerous infrared datasets proved that the proposed method could improve the expression effect and detection ability of the small target detection method. Meanwhile, our method outperforms state-of-the-art methods in terms of both accuracy and parameters.