Zero-shot object detection for infrared images using pre-trained vision and language models

Shotaro Miwa; Shun Otsubo; Qu Jia; Yasuaki Susumu

doi:10.1117/12.3014268

7 June 2024 Zero-shot object detection for infrared images using pre-trained vision and language models

Shotaro Miwa, Shun Otsubo, Qu Jia, Yasuaki Susumu

Proceedings Volume 13046, Infrared Technology and Applications L; 1304619 (2024) https://doi.org/10.1117/12.3014268
Event: SPIE Defense + Commercial Sensing, 2024, National Harbor, Maryland, United States

Abstract

Computer vision systems, such as object detection, traditionally rely on supervised learning and predetermined categories, an approach facing limitations when applied to infrared images due to dataset constraints. Emerging contrastive vision-language models, like (Contrastive Language-Image Pre-Training) CLIP, offer a transformative approach through their pre-training on extensive image-text pairs, providing diverse visual representations integrated with language semantics.

Our work proposes a novel zero-shot object detection approach for infrared images by extending the benefits of CLIP into this domain. We have developed a two-stage detection system using CLIP for detecting humans in infrared images. The first stage involves region proposal by a (You Only Look Once) YOLO object detector, followed by CLIP in the second stage. When compared with a YOLO model fine-tuned using infrared images, our proposed system demonstrates comparable performance, illustrating its efficacy as a zero-shot object detection approach. This method opens up new avenues for infrared image processing leveraging the capabilities of foundation models.

Conference Presentation

(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.

Citation Download Citation

Shotaro Miwa, Shun Otsubo, Qu Jia, and Yasuaki Susumu "Zero-shot object detection for infrared images using pre-trained vision and language models", Proc. SPIE 13046, Infrared Technology and Applications L, 1304619 (7 June 2024); https://doi.org/10.1117/12.3014268

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available