Paper
13 April 2023 Speech emotion analysis based on vision transformer
Xiaogang Huang, Qifeng Zheng, Yuanyuan Zhang, Dong Cheng, Yuting Liu, Chen Dong
Author Affiliations +
Proceedings Volume 12605, 2022 2nd Conference on High Performance Computing and Communication Engineering (HPCCE 2022); 126051K (2023) https://doi.org/10.1117/12.2673332
Event: Second Conference on High Performance Computing and Communication Engineering, 2022, Harbin, China
Abstract
Emotion is an essential aspect of human life, and effectively identifying corresponding emotions from different scenarios will help promote the development of human-computer interaction systems. Therefore, emotion classification has gradually become a challenging and popular research field. Compared with text emotion analysis, emotion analysis of audio data is still relatively immature. Traditional audio sentiment analysis research is based on feature information such as MFCC, MFSC, etc. while using time-memory models such as LSTM and RNN for emotion analysis. Due to the rapid development of transformers and attention mechanisms, many scholars have shifted their research from the RNN family to the transformer family or deep learning models with attention mechanisms. Therefore, this paper proposes a method to convert audio data into a spectrogram and use a vision transformer model based on transfer learning for emotion classification. This paper conducts experiments on the IEMOCAP dataset and the MELD dataset. The experimental results show that the emotion classification accuracy of the Vision transformer in the IEMOCAP and the MELD datasets reach 56.18% and 37.1%.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Xiaogang Huang, Qifeng Zheng, Yuanyuan Zhang, Dong Cheng, Yuting Liu, and Chen Dong "Speech emotion analysis based on vision transformer", Proc. SPIE 12605, 2022 2nd Conference on High Performance Computing and Communication Engineering (HPCCE 2022), 126051K (13 April 2023); https://doi.org/10.1117/12.2673332
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Emotion

Data modeling

Transformers

Visual process modeling

Analytical research

Data conversion

Machine learning

Back to Top