Paper
3 June 2011 Robust speech recognition using missing feature theory and target speech enhancement based on degenerate unmixing and estimation technique
Author Affiliations +
Abstract
A method for target speech enhancement based on degenerate unmixing and estimating technique (DUET) has been described. To avoid the requirements of the DUET which need to know the number of sources in advance and to estimate the attenuation and delay parameters for all sources, the method assumes that extraction of only one target signal is required, which is often plausible in real-world applications such as speech enhancement. The method can efficiently recover the target speech with fast convergence by estimating the parameters for the target source only. In addition, it does not need to know the number of sources in advance. In order to accomplish robust speech recognition, we propose an algorithm which employs the cluster-based missing feature reconstruction technique based on log-spectral features of enhanced speech in the process of extracting mel-frequency cepstral coefficients (MFCCs). The algorithm estimates missing time-frequency regions by computing the signal-to-noise ratios (SNRs) from the log-spectral features of the enhanced speech and observed noisy speech and by finding time-frequency segments which have the SNRs smaller than a threshold. The missing time-frequency regions are filled by using bounded estimation based on the log-spectral features that are considered to be reliable and on the knowledge of the log-spectral feature cluster to which the incoming target speech is assumed to belong. Then, the log-spectral features are transformed into cepstral features in the usual fashion of extracting MFCCs. Experimental results show that the proposed algorithm significantly improves recognition performance in noisy environments.
© (2011) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Minook Kim, Ji-Seon Kim, and Hyung-Min Park "Robust speech recognition using missing feature theory and target speech enhancement based on degenerate unmixing and estimation technique", Proc. SPIE 8058, Independent Component Analyses, Wavelets, Neural Networks, Biosystems, and Nanoengineering IX, 80580D (3 June 2011); https://doi.org/10.1117/12.883340
Lens.org Logo
CITATIONS
Cited by 3 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Time-frequency analysis

Detection and tracking algorithms

Speech recognition

Reconstruction algorithms

Signal attenuation

Signal to noise ratio

Binary data

Back to Top