Paper
1 April 2024 Advancements and challenges in speech emotion recognition: a comprehensive review
Jiaxin Wang, Hao Yin, Yiding Zhou, Wei Xi
Author Affiliations +
Proceedings Volume 13077, Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024); 130770D (2024) https://doi.org/10.1117/12.3027122
Event: 4th International Conference on Signal Processing and Machine Learning (CONF-SPML 2024), 2024, Chicago, IL, United States
Abstract
As the importance of human-computer interaction (HCI) continues to strengthen and the field of deep learning evolves, numerous models have found their application in the realm of Speech Emotion Recognition (SER), leading to significant advancements in recent years. However, effectively recognizing and processing human emotions through computational systems remains a complex and formidable challenge. This review aims to provide a comprehensive summary of the latest accomplishments in SER, encompassing a diverse range of application scenarios, from education and healthcare to criminal investigation. Additionally, it delves into various models and preprocessing techniques such as Convolutional Neural Networks (CNN), Convolutional Recurrent Neural Networks (CRNN), Long Short-Term Memory (LSTM), and datasets like RAVDESS and RECOLA, which encompass a wide array of scenes and languages. While the recent strides in SER have undeniably achieved impressive accuracy rates, a notable gap exists in research that addresses more intricate emotional contexts, including situations involving irony or sarcasm. Consequently, this review focuses on a comprehensive analysis of the limitations inherent in different feature engineering strategies. Moreover, it investigates the challenge of interpretability posed by complex models, the constraint posed by singular and hard-to-gather datasets, and the expansive scope of potential applications SER could serve. Considering these complexities, a potential pathway to further enhance SER's effectiveness and applicability is proposed. This involves exploring the concept of non-binary emotion classification, harnessing rich contextual information, and integrating datasets that incorporate gesture and textual data. By adapting feature extraction techniques to align with the unique demands of specific scenarios, the performance of SER models could be markedly improved.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Jiaxin Wang, Hao Yin, Yiding Zhou, and Wei Xi "Advancements and challenges in speech emotion recognition: a comprehensive review", Proc. SPIE 13077, Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024), 130770D (1 April 2024); https://doi.org/10.1117/12.3027122
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Emotion

Feature extraction

Data modeling

Speech recognition

Surface enhanced Raman spectroscopy

Systems modeling

Deep learning

Back to Top