Paper
12 September 2024 DS-Swin Transformer HRNet for remote sensing images
Guinan Wu, Qinghong Wu
Author Affiliations +
Proceedings Volume 13256, Fourth International Conference on Computer Vision and Pattern Analysis (ICCPA 2024); 132560W (2024) https://doi.org/10.1117/12.3037861
Event: Fourth International Conference on Computer Vision and Pattern Analysis (ICCPA 2024), 2024, Anshan, China
Abstract
With the continuous development of remote sensing imagery in deep learning, this paper proposes a self-attention model called Dual-Stream Swin Transformer to address the computational and memory requirements issues traditional Transformers face when dealing with high-resolution images. Specifically, 1) The Dual-Stream Swin Transformer in this paper adopts an innovative approach by decomposing the traditional Transformer encoder layer into smaller building blocks and introducing a shifted windows mechanism to construct self-attention. 2) Traditional Transformer models require significant computational and storage resources when processing high-resolution images because they perform self-attention calculations on the entire global image. However, the Swin Transformer significantly reduces the computational and memory requirements by segmenting the image into multiple spatially overlapping windows and performing self-attention calculations within each window. 3) Furthermore, the introduced shifted windows mechanism ensures that the attention of each window is only related to its adjacent windows, further reducing the computational complexity. This decomposition and window mechanism combination makes the Swin Transformer ideal for handling high-resolution visual inputs. It achieves high precision while offering higher computational efficiency and lower memory consumption. This makes the Swin Transformer perform excellently in image classification, object detection, and semantic segmentation tasks. We conducted comparative experiments between this model and other classical network models of the same type. The Dual-Stream Swin Transformer in this paper effectively addresses traditional Trans-formers' computational and memory challenges when handling high-resolution images through innovative decomposition and window mechanisms, providing a new solution for efficiently processing large-scale visual data.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Guinan Wu and Qinghong Wu "DS-Swin Transformer HRNet for remote sensing images", Proc. SPIE 13256, Fourth International Conference on Computer Vision and Pattern Analysis (ICCPA 2024), 132560W (12 September 2024); https://doi.org/10.1117/12.3037861
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Transformers

Remote sensing

Windows

Data modeling

Image processing

Performance modeling

Object detection

Back to Top