Paper
16 December 2021 Speaker recognition method based on deep residual network and improved power normalized Cepstral coefficients features
Runhua He, Pan Li, Xuemei Li, Shuhang Chen
Author Affiliations +
Proceedings Volume 12153, International Conference on Artificial Intelligence, Virtual Reality, and Visualization (AIVRV 2021); 121530B (2021) https://doi.org/10.1117/12.2626663
Event: International Conference on Artificial Intelligence, Virtual Reality, and Visualization (AIVRV 2021), 2021, Sanya, China
Abstract
In recent years, speaker recognition technology can achieve more than 90% accuracy in quiet environment, but it is not accurate in noise environment. In order to solve the problem of speaker recognition in noisy environment, a speaker recognition method based on deep residual network and improved Power Normalized Cepstral Coefficients (PNCC) features is proposed in the study. The PNCC features extraction algorithm is improved to be suitable for PNCC features parameter extraction in noisy environment. In order to further improve the training effect of the model, the deep residual network is adopted in this study to train the model, which effectively improves the model accuracy. In the recognition stage, in order to effectively suppress the influence of noise, this study proposes a VMD algorithm based on wavelet threshold denoising to improve the accuracy of speaker recognition. The experimental results show that compared with MFCC and PNCC features parameter extraction methods, the improved PNCC features extraction method is more conducive to feature extraction of speech signal under noise environment. Compared with other speaker recognition methods, the proposed speaker recognition method based on deep residual network and improved PNCC features has higher recognition accuracy.
© (2021) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Runhua He, Pan Li, Xuemei Li, and Shuhang Chen "Speaker recognition method based on deep residual network and improved power normalized Cepstral coefficients features", Proc. SPIE 12153, International Conference on Artificial Intelligence, Virtual Reality, and Visualization (AIVRV 2021), 121530B (16 December 2021); https://doi.org/10.1117/12.2626663
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Speaker recognition

Feature extraction

Detection and tracking algorithms

Signal processing

Wavelets

Denoising

Interference (communication)

Back to Top