Hybrid Deep Neural Network - Hidden Markov Model (DNN-HMM) based speech emotion recognition

Longfei Li, Yong Zhao, Dongmei Jiang, Yanning Zhang, Fengna Wang, Isabel Gonzalez, Enescu Valentin, Hichem Sahli

科研成果: 书/报告/会议事项章节会议稿件同行评审

143 引用 (Scopus)

摘要

Deep Neural Network Hidden Markov Models, or DNN-HMMs, are recently very promising acoustic models achieving good speech recognition results over Gaussian mixture model based HMMs (GMM-HMMs). In this paper, for emotion recognition from speech, we investigate DNN-HMMs with restricted Boltzmann Machine (RBM) based unsupervised pre-training, and DNN-HMMs with discriminative pre-training. Emotion recognition experiments are carried out on these two models on the eNTERFACE'05 database and Berlin database, respectively, and results are compared with those from the GMM-HMMs, the shallow-NN-HMMs with two layers, as well as the Multi-layer Perceptrons HMMs (MLP-HMMs). Experimental results show that when the numbers of the hidden layers as well hidden units are properly set, the DNN could extend the labeling ability of GMM-HMM. Among all the models, the DNN-HMMs with discriminative pre-training obtain the best results. For example, for the eNTERFACE'05 database, the recognition accuracy improves 12.22% from the DNN-HMMs with unsupervised pre-training, 11.67% from the GMM-HMMs, 10.56% from the MLP-HMMs, and even 17.22% from the shallow-NN-HMMs, respectively.

源语言英语
主期刊名Proceedings - 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013
312-317
页数6
DOI
出版状态已出版 - 2013
活动2013 5th Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013 - Geneva, 瑞士
期限: 2 9月 20135 9月 2013

出版系列

姓名Proceedings - 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013

会议

会议2013 5th Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013
国家/地区瑞士
Geneva
时期2/09/135/09/13

指纹

探究 'Hybrid Deep Neural Network - Hidden Markov Model (DNN-HMM) based speech emotion recognition' 的科研主题。它们共同构成独一无二的指纹。

引用此