2 March 2023 Exploring complementarity of global and local information for effective lip reading
Yewei Xiao, Lianwei Teng, Xuanming Liu, Aosu Zhu
Author Affiliations +
Abstract

Lip reading aims at recognizing texts from a talking face without audio information. Recently, some works have focused on how to effectively extract the spatial information and temporal information. We introduce an innovative two-stream network to make full use of the complementarity of global spatial information and local spatial information. The global spatial information is directly generated by the global stream. Furthermore, we design a patches selection module in the local stream to conveniently select the critical local information using attention mechanism. Then, the fusion features of the two streams and the global features are fed into the temporal module to further explore the temporal clues. To guide the selection of the local information from the fused features and to make the global stream and local stream learn from each other, we design a global information guide loss and a mutual learning loss, respectively. Finally, extensive experiments on both LRW and CAS-VSR-W1K datasets demonstrate the superiority of our two-stream work.

© 2023 SPIE and IS&T
Yewei Xiao, Lianwei Teng, Xuanming Liu, and Aosu Zhu "Exploring complementarity of global and local information for effective lip reading," Journal of Electronic Imaging 32(2), 023001 (2 March 2023). https://doi.org/10.1117/1.JEI.32.2.023001
Received: 5 August 2022; Accepted: 10 February 2023; Published: 2 March 2023
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Laser induced plasma spectroscopy

Feature extraction

Feature fusion

Education and training

Design and modelling

Video

Motion models

RELATED CONTENT


Back to Top