Emotion Recognition Based on Speech Signals by Combining Empirical Mode Decomposition and Deep Neural Network

Shing Tai Pan; Ching -Fa Chen; Chuan- Cheng Hong

doi:10.54646/bijiam.2023.11

PDF HTML XML EPUB

Abstract Views: 198

PDF Views/Downloads: 44

HTML Views/Downloads: 6

XML Views/Downloads: 24

EPUB Views/Downloads: 30

How to Cite

Tai Pan, S., -Fa Chen, C., & Cheng Hong, C.-. (2023). Emotion Recognition Based on Speech Signals by Combining Empirical Mode Decomposition and Deep Neural Network. BOHR International Journal of Internet of Things, Artificial Intelligence and Machine Learning, 2(1), 1–10. https://doi.org/10.54646/bijiam.2023.11

Published: Oct 18, 2023

Updated: 2023-10-18

Versions:

2023-10-18 (2)

2023-10-18 (1)

DOI: https://doi.org/10.54646/bijiam.2023.11

Dimensions Citation count:

Keywords:

Speech emotion recognition
, empirical mode decomposition
deep neural network
Mel-scale Frequency
Cepstral Coefficients
hidden Markov model

Authors

Shing-Tai Pan

Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan, R.O.C

Ching-Fa Chen

Department of Electronic Engineering, Kao Yuan University, Kaohsiung, Taiwan, R.O.C

Chuan-Cheng Hong

Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan, R.O.C

Abstract

This paper proposes a novel method for speech emotion recognition. Empirical mode decomposition (EMD) is applied in this paper for the extraction of emotional features from speeches, and a deep neural network (DNN) is used to classify speech emotions. This paper enhances the emotional components in speech signals by using EMD with acoustic feature Mel-Scale Frequency Cepstral Coefficients (MFCCs) to improve the recognition rates of emotions from speeches using the classifier DNN. In this paper, EMD is first used to decompose the speech signals, which contain emotional components into multiple intrinsic mode functions (IMFs), and then emotional features are derived from the IMFs and are calculated using MFCC. Then, the emotional features are used to train the DNN model. Finally, a trained model that could recognize the emotional signals is then used to identify emotions in speeches. Experimental results reveal that the proposed method is effective.

Share This Article On Social Media

Usage Statistics

Downloads

Download data is not yet available.

Issue

Vol. 2 No. 1 (2023): BOHR International Journal of Internet of things, Artificial Intelligence and Machine Learning (BIJIAM)

Section

Methods

Article Sidebar

Main Article Content

Authors

Abstract

Downloads

Article Details