Paper
6 November 2019 Non-native speech recognition using audio style transfer
Author Affiliations +
Proceedings Volume 11176, Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2019; 111762J (2019) https://doi.org/10.1117/12.2536535
Event: Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2019, 2019, Wilga, Poland
Abstract
Recently automatic speech recognition (ASR) systems achieve higher and higher accuracy rates. However, the score drops significantly, when the ASR system is being used with a non-native speaker of the language to be recognized, mainly because of specific pronunciation and accent features. A limited volume of labeled datasets containing samples of a non-native speech makes it difficult to train any new ASR systems targeted for non-native speakers. In our research, we tried tackling the problem of a non-native accent and its influence on the accuracy of ASR systems, using the style transfer methodology. We designed a pipeline for modifying the speech produced by a nonnative speaker, so that it resembles the native speech to a higher extent, i.e. a method for accent neutralization. Our methodology can be used as a wrapper for any existing ASR system, which reduces the necessity of training new speech recognizers, adapted for non-native speech. The modification can be thus performed on the fly, before passing the data forward to the speech recognition system itself.
© (2019) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Kacper Radzikowski, Mateusz Forc, Le Wang, Osamu Yoshie, and Robert M. Nowak "Non-native speech recognition using audio style transfer", Proc. SPIE 11176, Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2019, 111762J (6 November 2019); https://doi.org/10.1117/12.2536535
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Speech recognition

Neural networks

Machine learning

Computer science

Computing systems

Convolutional neural networks

Databases

Back to Top