Autoencoder (AE) is widely used in image fusion. However, AE-based fusion methods usually use the same encoder to extract the features of images from different sensors/modalities without considering the differences between them. In addition, these methods cannot fuse the images in real time. To solve these problems, an end-to-end fusion network is proposed for fast infrared image and visible image fusion. We design an end-to-end W-shaped network (W-Net), which consists of two independent encoders, one shared decoder and skip connections. The two encoders extract the representative features of images from different sources respectively, and the decoder combines the hierarchical features from corresponding layers and reconstructs the fused image without using additional fusion layer or any handcrafted fusion rules. Skip connections are added to help retain the details and salient features in the fused image. Specifically, W-Net is lightweight, with fewer parameters than the existing AE-based methods. The experimental results show that our fusion network performs well in terms of subjective and objective visual assessments compared with other state-of-the-art fusion methods. It can fuse the images very fast (e.g., the fusion time of 20 pairs of images in the TNO dataset is 0.871 to 1.081 ms), operating above real-time speed.
Deep learning-based object detection approaches have shown excellent performance in RGB images. However, when used to detect objects from infrared images, the accuracy may reduce significantly due to low contrast, obscure textures and strong noise of infrared images. To alleviate the problem, we design a detail enhancement module involving spatial attention mechanism to enhance the textures and details of images. The output of the proposed module is fed into modified YOLOv4. We introduce Alpha-IoU loss and Weighted-NMS to YOLOv4 to enhance geometric factors in both bounding box regression and Non-Maximum Suppression, leading to notable gains of average precision. The experiment results show that compared with YOLOv4, mAP0.5 and mAP0.5:0.95 of our model are improved by 1.1% and 3.5% respectively, effectively improving the detection accuracy.
Accurate and robust detection of prohibited items in x-ray images has been playing a significant role in protecting public safety. However, large-scale variation of prohibited items and diverse backgrounds in x-ray images bring in many challenges to the detection. We propose an effective weight-guided dual-direction-fusion feature pyramid network (WDFPN), making full use of multilevel features to solve the scale variation problem in cluttered backgrounds. Specifically, our WDFPN mainly consists of weight-guided upsample fusion pathway (WUFP), attention-based connection (AC), and downsample fusion pathway (DFP). WUFP uses channel-wise weights generated from high-level features to weight low-level features, reducing invalid information redundancy. AC transfers enhanced low-level detail information to DFP. Subsequently, DFP improves the localization capacity of the entire features pyramid by the bottom-up fusion pathway. Extensive experiments on the security inspection x-ray and occluded prohibited items x-ray datasets demonstrate the superiority of our WDFPN in detecting prohibited items.
Siamese trackers have attracted great attention on visual object tracking due to their real-time speed and high accuracy. In this paper, we propose a dual path aggregation network (SiamDPAN) for high-performance tracking. First, we build a multi-level similarity maps aggregation (MSA) structure, which predicts and fuses the similarity maps from multi-level features. Second, we propose a mask path aggregation module (MPA) for better capturing the appearance changes of objects by propagating maps in low-layers. We conduct sufficient ablation studies to demonstrate the effectiveness of our proposed tracker. We only train our network with two datasets, achieving 0.436 EAO and 0.351 EAO on VOT2016 and VOT2018.
A new algorithm was proposed for medical images fusion in this paper, which combined gradient minimization smoothing filter (GMSF) with non-sampled directional filter bank (NSDFB). In order to preserve more detail information, a multi scale edge preserving decomposition framework (MEDF) was used to decompose an image into a base image and a series of detail images. For the fusion of base images, the local Gaussian membership function is applied to construct the fusion weighted factor. For the fusion of detail images, NSDFB was applied to decompose each detail image into multiple directional sub-images that are fused by pulse coupled neural network (PCNN) respectively. The experimental results demonstrate that the proposed algorithm is superior to the compared algorithms in both visual effect and objective assessment.
The millimeter-wave images have low resolution and heavy noise. Hence it is hard to detect the edges in such images. A
detection scheme based on curvelet transform is proposed. The idea is to suppress noise through Wrapping algorithm of curvelet
transform at first, then determine gradient amplitude of pixels. At last the non-maximum and double threshold method are used to
obtain the edges. The experiments show that clear edges of human and object images in millimeter-wave images can be detected
efficiently, and the scheme implements fast.
Recent research has demonstrated that a backpropagation neural network classifier is a useful tool for multispectral remote sensing image classification. However, its training time is too long and the network's generalization ability is not good enough. Here, a new method is developed not only to accelerate the training speed but also to increase the accuracy of the classification. The method is composed of two steps. First, a simple penal term is added to the conventional squared error to increase the network's generalization ability. Secondly, the fixed factor method is used to find the optimal learning rate. We have applied it to the classification of landsat MSS data. The results show that the training time is much shorter and the accuracy of classification is increased as well. The results are also compared to the maximum likelihood method which demonstrate that the back-propagation neural network classifier is more efficient.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.