Paper
22 February 2023 Video-text cross-modal retrieval algorithm based on multiple coding
Yufan Xu
Author Affiliations +
Proceedings Volume 12587, Third International Seminar on Artificial Intelligence, Networking, and Information Technology (AINIT 2022); 125871L (2023) https://doi.org/10.1117/12.2667669
Event: Third International Seminar on Artificial Intelligence, Networking, and Information Technology (AINIT 2022), 2022, Shanghai, China
Abstract
Currently, more and more video data and terminal devices accessing video resources are available to users. Video platforms such as Tiktok and Youtube are gradually rising, and the user scale and video resources are increasing day by day, which brings an urgent practical demand for video-text data cross-modal retrieval. This paper proposes a video-text cross-modal retrieval algorithm based on multiple encoding. By encoding the global features, serial features and local features of video and text, the encoded features are mapped to the common embedding space for training, loss function calculation and optimization. Through experimental verification on MASR-VTT data set and comparison with existing methods, the overall performance R@sum increased by 9.22% and 2.86% respectively, which proved the superiority of this method.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Yufan Xu "Video-text cross-modal retrieval algorithm based on multiple coding", Proc. SPIE 12587, Third International Seminar on Artificial Intelligence, Networking, and Information Technology (AINIT 2022), 125871L (22 February 2023); https://doi.org/10.1117/12.2667669
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Video

Video coding

Semantic video

Video processing

Education and training

Semantics

Feature extraction

RELATED CONTENT


Back to Top