Paper
22 February 2023 Research on image captioning based on LSTM and YOLOv5 fusion attention mechanism
Xiaolliang Zhang, Qingtao Zeng, Yeli Li, Likun Lu, Weichun Yang
Author Affiliations +
Proceedings Volume 12587, Third International Seminar on Artificial Intelligence, Networking, and Information Technology (AINIT 2022); 125870Y (2023) https://doi.org/10.1117/12.2667667
Event: Third International Seminar on Artificial Intelligence, Networking, and Information Technology (AINIT 2022), 2022, Shanghai, China
Abstract
Humans can easily learn to recognize every object in life, every landscape, and describe the things around them in detail from the process of growing up, but computers cannot. How to make computers learn to describe things in pictures has become the research direction of many scholars. If this technology is mature, it will bring great boon to people with visual impairments. They can understand the things around them and the beautiful earth through hearing. Robots recognize objects and understand their surroundings. With the development of artificial intelligence, the power of convolutional neural networks is more and more comparable to that of the human brain. In recent years, many scholars have proposed different methods to seek better solutions to this problem, including generative adversarial networks. Based on the classic structure of Encoder-Decoder, this paper first compares the code implementation and results of ResNet101 as an Encoder on the COCO dataset, and then proposes a new solution that integrates YOLOv5 and LSTM, aiming to improve the model inference speed and inference accuracy.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Xiaolliang Zhang, Qingtao Zeng, Yeli Li, Likun Lu, and Weichun Yang "Research on image captioning based on LSTM and YOLOv5 fusion attention mechanism", Proc. SPIE 12587, Third International Seminar on Artificial Intelligence, Networking, and Information Technology (AINIT 2022), 125870Y (22 February 2023); https://doi.org/10.1117/12.2667667
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Networks

Education and training

Image processing

Evolutionary algorithms

Detection and tracking algorithms

Computer programming

Data modeling

Back to Top