Open Access Paper
28 December 2022 Attention enhanced dynamic kernel convolution for TDNN-based speaker verification
Xiaofan Lang, Ya Li
Author Affiliations +
Proceedings Volume 12506, Third International Conference on Computer Science and Communication Technology (ICCSCT 2022); 1250605 (2022) https://doi.org/10.1117/12.2662523
Event: International Conference on Computer Science and Communication Technology (ICCSCT 2022), 2022, Beijing, China
Abstract
Speaker embedding is a state-of-the-art front-end module, which is used to extract discriminative speaker features for speaker-related tasks. The Time Delay Neural Network (TDNN) has been a classical network architecture since it was first applied on speaker related tasks known as X-vector. In this paper, we propose new network structures based on current popular ECAPA-TDNN. We propose a dynamic kernel convolution module to extract features from short-term and long-term context adaptively, thus achieving multi-scale receptive fields. We also apply three enhanced attention modules instead of plain Squeeze-Excitation (SE) layer to realize more efficient information interaction between channels and spaces. The proposed architectures are superior to the most advanced network, with an optimal Equal Error Rate (EER) of 6.40% and a parameters reduction of 6.32%, they also achieve better performances when speaker utterances are shortened.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Xiaofan Lang and Ya Li "Attention enhanced dynamic kernel convolution for TDNN-based speaker verification", Proc. SPIE 12506, Third International Conference on Computer Science and Communication Technology (ICCSCT 2022), 1250605 (28 December 2022); https://doi.org/10.1117/12.2662523
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Convolution

Speaker recognition

Feature extraction

Performance modeling

Systems modeling

Astatine

Network architectures

RELATED CONTENT

An improved approach for two-stage detection model
Proceedings of SPIE (October 12 2022)
Research on text classification of telecom user query
Proceedings of SPIE (December 28 2022)
End-to-end online handwriting signature verification
Proceedings of SPIE (May 06 2019)
Event detection without trigger words on movie scripts
Proceedings of SPIE (November 10 2020)

Back to Top