Paper
6 November 2019 Multi-objective noisy-based deep feature loss for speech enhancement
Rafal Pilarczyk, Władysław Skarbek
Author Affiliations +
Proceedings Volume 11176, Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2019; 111762W (2019) https://doi.org/10.1117/12.2536967
Event: Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2019, 2019, Wilga, Poland
Abstract
Deep neural networks have become a great tool for creating solutions to denoise the speech signal, improving the intelligibility, speech quality and signal-to-noise ratio. An important element during training deep speech networks is the use of an appropriate loss function that allows to improvement the subjective and objective measures. In our work, we used the loss function based on a well-trained deep network to classify whether the signal is noisy and clean. Thanks to this, the deep network responsible for denoising is based on minimizing the difference of deep features of the pure and enhanced signal. Our work shows that the use of only deep features in the loss function allows a significant improvement in the measurement of speech signal quality. Novelty is also feature extractor, which has been trained as a multi-objective noise classifier. We believe that deep-feature loss could help in the optimization of functions difficult to differentiate.
© (2019) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Rafal Pilarczyk and Władysław Skarbek "Multi-objective noisy-based deep feature loss for speech enhancement", Proc. SPIE 11176, Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2019, 111762W (6 November 2019); https://doi.org/10.1117/12.2536967
Lens.org Logo
CITATIONS
Cited by 1 scholarly publication.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Signal to noise ratio

Time-frequency analysis

Neural networks

Convolution

Denoising

Fourier transforms

Back to Top