Paper
7 March 2024 Research on grid inspection technology based on general knowledge enhanced multimodal large language models
Author Affiliations +
Proceedings Volume 13086, MIPPR 2023: Pattern Recognition and Computer Vision; 130860B (2024) https://doi.org/10.1117/12.2692326
Event: Twelfth International Symposium on Multispectral Image Processing and Pattern Recognition (MIPPR2023), 2023, Wuhan, China
Abstract
Currently, object detection based on deep learning has received extensive research and attention in the field of grid inspection, achieving high detection accuracy and recognition precision. However, pre-trained object detection models lack overall perception and reasoning capabilities, resulting in higher false positives and missings due to a lack of holistic understanding of challenging samples. Recently, the combination of natural language models and image understanding in multi-modal large language models has gained significant attention. In this paper, we propose the Grid-Blip model, a multi-modal large model enhanced with general knowledge, to specifically study wildfires detection in grid inspection. Grid-Blip is based on the blip model architecture, which includes a natural language model, a visual generation model, and a fusion model. We conduct large-scale sample annotation at the semantic level of whole-image grid inspection, providing crucial training samples for multi-modal large-scale model research. Furthermore, we investigate the design of the fusion model network, training the model to effectively integrate the pre-trained natural language model and visual generation model. Experimental results demonstrate that compared to object detection models, the proposed multi-modal large-scale model in this paper achieves overall semantic perception and reasoning capabilities. The Grid-Blip model reduces the false alarm rate for wildfire smoke trend prediction from 20% to 10% and the missed detection rate from 18% to 13%.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Peng Gao, Zhuyi Rao, Shengpu Gao, Yun Zheng, and Ying Li "Research on grid inspection technology based on general knowledge enhanced multimodal large language models", Proc. SPIE 13086, MIPPR 2023: Pattern Recognition and Computer Vision, 130860B (7 March 2024); https://doi.org/10.1117/12.2692326
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Visual process modeling

Inspection

Object detection

Education and training

Semantics

Statistical modeling

Visualization

Back to Top