6 November 2021 Lipreading model based on a two-way convolutional neural network and feature fusion
Meili Zhu, Qingqing Wang, Yingying Ge
Author Affiliations +
Abstract

Lipreading feature extraction is essentially the feature extraction of continuous video frame sequences. A lipreading model based on a two-way convolutional neural network and features is proposed to obtain more reasonable visual-spatial–temporal characteristics. Unlike other lipreading methods based on deep learning, the rank pooling method transforms lip video into a standard RGB image that can be directly input into the convolutional neural network, which effectively reduces the input dimension. In addition, to compensate for the lack of spatial information, the apparent shape and depth features are fused, and then the joint cost function is used to guide the network model learning to obtain more distinguishing features. The experimental results were evaluated on the public GRID database and OuluVS2 database. It shows that the accuracy of the proposed method can reach more than 93%, which validates the effectiveness of the method.

© 2021 SPIE and IS&T 1017-9909/2021/$28.00 © 2021 SPIE and IS&T
Meili Zhu, Qingqing Wang, and Yingying Ge "Lipreading model based on a two-way convolutional neural network and feature fusion," Journal of Electronic Imaging 30(6), 063003 (6 November 2021). https://doi.org/10.1117/1.JEI.30.6.063003
Received: 18 July 2021; Accepted: 15 October 2021; Published: 6 November 2021
Lens.org Logo
CITATIONS
Cited by 1 scholarly publication.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Laser induced plasma spectroscopy

Video

Convolutional neural networks

Feature extraction

Databases

RGB color model

Information visualization

Back to Top