TY - GEN
T1 - Hybrid Deep Neural Network - Hidden Markov Model (DNN-HMM) based speech emotion recognition
AU - Li, Longfei
AU - Zhao, Yong
AU - Jiang, Dongmei
AU - Zhang, Yanning
AU - Wang, Fengna
AU - Gonzalez, Isabel
AU - Valentin, Enescu
AU - Sahli, Hichem
PY - 2013
Y1 - 2013
N2 - Deep Neural Network Hidden Markov Models, or DNN-HMMs, are recently very promising acoustic models achieving good speech recognition results over Gaussian mixture model based HMMs (GMM-HMMs). In this paper, for emotion recognition from speech, we investigate DNN-HMMs with restricted Boltzmann Machine (RBM) based unsupervised pre-training, and DNN-HMMs with discriminative pre-training. Emotion recognition experiments are carried out on these two models on the eNTERFACE'05 database and Berlin database, respectively, and results are compared with those from the GMM-HMMs, the shallow-NN-HMMs with two layers, as well as the Multi-layer Perceptrons HMMs (MLP-HMMs). Experimental results show that when the numbers of the hidden layers as well hidden units are properly set, the DNN could extend the labeling ability of GMM-HMM. Among all the models, the DNN-HMMs with discriminative pre-training obtain the best results. For example, for the eNTERFACE'05 database, the recognition accuracy improves 12.22% from the DNN-HMMs with unsupervised pre-training, 11.67% from the GMM-HMMs, 10.56% from the MLP-HMMs, and even 17.22% from the shallow-NN-HMMs, respectively.
AB - Deep Neural Network Hidden Markov Models, or DNN-HMMs, are recently very promising acoustic models achieving good speech recognition results over Gaussian mixture model based HMMs (GMM-HMMs). In this paper, for emotion recognition from speech, we investigate DNN-HMMs with restricted Boltzmann Machine (RBM) based unsupervised pre-training, and DNN-HMMs with discriminative pre-training. Emotion recognition experiments are carried out on these two models on the eNTERFACE'05 database and Berlin database, respectively, and results are compared with those from the GMM-HMMs, the shallow-NN-HMMs with two layers, as well as the Multi-layer Perceptrons HMMs (MLP-HMMs). Experimental results show that when the numbers of the hidden layers as well hidden units are properly set, the DNN could extend the labeling ability of GMM-HMM. Among all the models, the DNN-HMMs with discriminative pre-training obtain the best results. For example, for the eNTERFACE'05 database, the recognition accuracy improves 12.22% from the DNN-HMMs with unsupervised pre-training, 11.67% from the GMM-HMMs, 10.56% from the MLP-HMMs, and even 17.22% from the shallow-NN-HMMs, respectively.
UR - http://www.scopus.com/inward/record.url?scp=84893307972&partnerID=8YFLogxK
U2 - 10.1109/ACII.2013.58
DO - 10.1109/ACII.2013.58
M3 - 会议稿件
AN - SCOPUS:84893307972
SN - 9780769550480
T3 - Proceedings - 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013
SP - 312
EP - 317
BT - Proceedings - 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013
T2 - 2013 5th Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013
Y2 - 2 September 2013 through 5 September 2013
ER -