Research on grid inspection technology based on general knowledge enhanced multimodal large language models

Peng Gao; Zhuyi Rao; Shengpu Gao; Yun Zheng; Ying Li

doi:10.1117/12.2692326

7 March 2024 Research on grid inspection technology based on general knowledge enhanced multimodal large language models

Peng Gao, Zhuyi Rao, Shengpu Gao, Yun Zheng, Ying Li

Author Affiliations +

Proceedings Volume 13086, MIPPR 2023: Pattern Recognition and Computer Vision; 130860B (2024) https://doi.org/10.1117/12.2692326
Event: Twelfth International Symposium on Multispectral Image Processing and Pattern Recognition (MIPPR2023), 2023, Wuhan, China

Abstract

Currently, object detection based on deep learning has received extensive research and attention in the field of grid inspection, achieving high detection accuracy and recognition precision. However, pre-trained object detection models lack overall perception and reasoning capabilities, resulting in higher false positives and missings due to a lack of holistic understanding of challenging samples. Recently, the combination of natural language models and image understanding in multi-modal large language models has gained significant attention. In this paper, we propose the Grid-Blip model, a multi-modal large model enhanced with general knowledge, to specifically study wildfires detection in grid inspection. Grid-Blip is based on the blip model architecture, which includes a natural language model, a visual generation model, and a fusion model. We conduct large-scale sample annotation at the semantic level of whole-image grid inspection, providing crucial training samples for multi-modal large-scale model research. Furthermore, we investigate the design of the fusion model network, training the model to effectively integrate the pre-trained natural language model and visual generation model. Experimental results demonstrate that compared to object detection models, the proposed multi-modal large-scale model in this paper achieves overall semantic perception and reasoning capabilities. The Grid-Blip model reduces the false alarm rate for wildfire smoke trend prediction from 20% to 10% and the missed detection rate from 18% to 13%.

(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.

Citation Download Citation

Peng Gao, Zhuyi Rao, Shengpu Gao, Yun Zheng, and Ying Li "Research on grid inspection technology based on general knowledge enhanced multimodal large language models", Proc. SPIE 13086, MIPPR 2023: Pattern Recognition and Computer Vision, 130860B (7 March 2024); https://doi.org/10.1117/12.2692326

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $17.00

Non-members: $21.00 ADD TO CART

PROCEEDINGS
7 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Visual process modeling

Inspection

Object detection

Education and training

Semantics

Statistical modeling

Visualization

Show All Keywords

Keywords/Phrases

Search In:

Publication Years