Poster + Paper
7 June 2024 VTR: an optimized vision transformer for SAR ATR acceleration on FPGA
Sachini Wickramasinghe, Dhruv Parikh, Bingyi Zhang, Rajgopal Kannan, Viktor Prasanna, Carl Busart
Author Affiliations +
Conference Poster
Abstract
Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR) is a key technique used in military applications like remote-sensing image recognition. Vision Transformers (ViTs) are the state-of-the-art in various computer vision applications, outperforming Convolutional Neural Networks (CNNs). However, using ViTs for SAR ATR applications is challenging due to (1) standard ViTs require extensive training data to generalize well due to their low locality. The standard SAR datasets have a limited number of labeled training data, reducing the learning capability of ViTs (2) ViTs have a high parameter count and are computation intensive which makes their deployment on resource-constrained SAR platforms difficult. In this work, we develop a lightweight ViT model that can be trained directly on small datasets without pre-training. To this end, we incorporate the Shifted Patch Tokenization (SPT) and Locality Self-Attention (LSA) modules into the ViT model. We directly train this model on SAR datasets to evaluate its effectiveness for SAR ATR applications. The proposed model, VTR (ViT for SAR ATR), is evaluated on three widely used SAR datasets: MSTAR, SynthWakeSAR, and GBSAR. Experimental results show that the proposed VTR model achieves a classification accuracy of 95.96%, 93.47%, and 99.46% on MSTAR, SynthWakeSAR, and GBSAR datasets, respectively. VTR achieves accuracy comparable to the state-of-the-art models on MSTAR and GBSAR datasets with 1.1× and 36× smaller model sizes, respectively. On SynthWakeSAR dataset, VTR achieves a higher accuracy with a model size that is 17× smaller. Further, a novel FPGA accelerator is proposed for VTR, to enable real-time SAR ATR applications. Compared with the implementation of VTR on state-of-the-art CPU and GPU platforms, our FPGA implementation achieves latency reduction by a factor of 70× and 30×, respectively. For inference on small batch sizes, our FPGA implementation achieves a 2× higher throughput compared with GPU.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Sachini Wickramasinghe, Dhruv Parikh, Bingyi Zhang, Rajgopal Kannan, Viktor Prasanna, and Carl Busart "VTR: an optimized vision transformer for SAR ATR acceleration on FPGA", Proc. SPIE 13030, Image Sensing Technologies: Materials, Devices, Systems, and Applications XI, 130300F (7 June 2024); https://doi.org/10.1117/12.3013580
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Synthetic aperture radar

Data modeling

Matrices

Performance modeling

Education and training

Field programmable gate arrays

Head

Back to Top