Presentation + Paper
7 June 2024 Zero-shot object detection for infrared images using pre-trained vision and language models
Shotaro Miwa, Shun Otsubo, Qu Jia, Yasuaki Susumu
Author Affiliations +
Abstract
Computer vision systems, such as object detection, traditionally rely on supervised learning and predetermined categories, an approach facing limitations when applied to infrared images due to dataset constraints. Emerging contrastive vision-language models, like (Contrastive Language-Image Pre-Training) CLIP, offer a transformative approach through their pre-training on extensive image-text pairs, providing diverse visual representations integrated with language semantics.

Our work proposes a novel zero-shot object detection approach for infrared images by extending the benefits of CLIP into this domain. We have developed a two-stage detection system using CLIP for detecting humans in infrared images. The first stage involves region proposal by a (You Only Look Once) YOLO object detector, followed by CLIP in the second stage. When compared with a YOLO model fine-tuned using infrared images, our proposed system demonstrates comparable performance, illustrating its efficacy as a zero-shot object detection approach. This method opens up new avenues for infrared image processing leveraging the capabilities of foundation models.
Conference Presentation
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Shotaro Miwa, Shun Otsubo, Qu Jia, and Yasuaki Susumu "Zero-shot object detection for infrared images using pre-trained vision and language models", Proc. SPIE 13046, Infrared Technology and Applications L, 1304619 (7 June 2024); https://doi.org/10.1117/12.3014268
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Object detection

Infrared imaging

Visual process modeling

Infrared detectors

Computer vision technology

Infrared sensors

Sensors

Back to Top