Hybrid Deep Neural Network - Hidden Markov Model (DNN-HMM) based speech emotion recognition

Longfei Li; Yong Zhao; Dongmei Jiang; Yanning Zhang; Fengna Wang; Isabel Gonzalez; Enescu Valentin; Hichem Sahli

doi:10.1109/ACII.2013.58

Hybrid Deep Neural Network - Hidden Markov Model (DNN-HMM) based speech emotion recognition

Longfei Li, Yong Zhao, Dongmei Jiang, Yanning Zhang, Fengna Wang, Isabel Gonzalez, Enescu Valentin, Hichem Sahli

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

143 引用（Scopus）

摘要

Deep Neural Network Hidden Markov Models, or DNN-HMMs, are recently very promising acoustic models achieving good speech recognition results over Gaussian mixture model based HMMs (GMM-HMMs). In this paper, for emotion recognition from speech, we investigate DNN-HMMs with restricted Boltzmann Machine (RBM) based unsupervised pre-training, and DNN-HMMs with discriminative pre-training. Emotion recognition experiments are carried out on these two models on the eNTERFACE'05 database and Berlin database, respectively, and results are compared with those from the GMM-HMMs, the shallow-NN-HMMs with two layers, as well as the Multi-layer Perceptrons HMMs (MLP-HMMs). Experimental results show that when the numbers of the hidden layers as well hidden units are properly set, the DNN could extend the labeling ability of GMM-HMM. Among all the models, the DNN-HMMs with discriminative pre-training obtain the best results. For example, for the eNTERFACE'05 database, the recognition accuracy improves 12.22% from the DNN-HMMs with unsupervised pre-training, 11.67% from the GMM-HMMs, 10.56% from the MLP-HMMs, and even 17.22% from the shallow-NN-HMMs, respectively.

源语言	英语
主期刊名	Proceedings - 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013
页	312-317
页数	6
DOI	https://doi.org/10.1109/ACII.2013.58
出版状态	已出版 - 2013
活动	2013 5th Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013 - Geneva, 瑞士期限: 2 9月 2013 → 5 9月 2013

出版系列

姓名	Proceedings - 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013

会议

会议	2013 5th Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013
国家/地区	瑞士
市	Geneva
时期	2/09/13 → 5/09/13

访问文件

10.1109/ACII.2013.58

其它文件与链接

链接到 Scopus 的出版物

引用此

Li, L., Zhao, Y., Jiang, D., Zhang, Y., Wang, F., Gonzalez, I., Valentin, E., & Sahli, H. (2013). Hybrid Deep Neural Network - Hidden Markov Model (DNN-HMM) based speech emotion recognition. 在 Proceedings - 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013 (页码 312-317). 文章 6681449 (Proceedings - 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013). https://doi.org/10.1109/ACII.2013.58

Li, Longfei ; Zhao, Yong ; Jiang, Dongmei 等. / Hybrid Deep Neural Network - Hidden Markov Model (DNN-HMM) based speech emotion recognition. Proceedings - 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013. 2013. 页码 312-317 (Proceedings - 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013).

@inproceedings{f06a4983eb574ee4883c10ba2949a95c,

title = "Hybrid Deep Neural Network - Hidden Markov Model (DNN-HMM) based speech emotion recognition",

abstract = "Deep Neural Network Hidden Markov Models, or DNN-HMMs, are recently very promising acoustic models achieving good speech recognition results over Gaussian mixture model based HMMs (GMM-HMMs). In this paper, for emotion recognition from speech, we investigate DNN-HMMs with restricted Boltzmann Machine (RBM) based unsupervised pre-training, and DNN-HMMs with discriminative pre-training. Emotion recognition experiments are carried out on these two models on the eNTERFACE'05 database and Berlin database, respectively, and results are compared with those from the GMM-HMMs, the shallow-NN-HMMs with two layers, as well as the Multi-layer Perceptrons HMMs (MLP-HMMs). Experimental results show that when the numbers of the hidden layers as well hidden units are properly set, the DNN could extend the labeling ability of GMM-HMM. Among all the models, the DNN-HMMs with discriminative pre-training obtain the best results. For example, for the eNTERFACE'05 database, the recognition accuracy improves 12.22% from the DNN-HMMs with unsupervised pre-training, 11.67% from the GMM-HMMs, 10.56% from the MLP-HMMs, and even 17.22% from the shallow-NN-HMMs, respectively.",

author = "Longfei Li and Yong Zhao and Dongmei Jiang and Yanning Zhang and Fengna Wang and Isabel Gonzalez and Enescu Valentin and Hichem Sahli",

year = "2013",

doi = "10.1109/ACII.2013.58",

language = "英语",

isbn = "9780769550480",

series = "Proceedings - 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013",

pages = "312--317",

booktitle = "Proceedings - 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013",

note = "2013 5th Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013 ; Conference date: 02-09-2013 Through 05-09-2013",

}

Li, L, Zhao, Y, Jiang, D, Zhang, Y, Wang, F, Gonzalez, I, Valentin, E & Sahli, H 2013, Hybrid Deep Neural Network - Hidden Markov Model (DNN-HMM) based speech emotion recognition. 在 Proceedings - 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013., 6681449, Proceedings - 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013, 页码 312-317, 2013 5th Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013, Geneva, 瑞士, 2/09/13. https://doi.org/10.1109/ACII.2013.58

Hybrid Deep Neural Network - Hidden Markov Model (DNN-HMM) based speech emotion recognition. / Li, Longfei; Zhao, Yong; Jiang, Dongmei 等.
Proceedings - 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013. 2013. 页码 312-317 6681449 (Proceedings - 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Hybrid Deep Neural Network - Hidden Markov Model (DNN-HMM) based speech emotion recognition

AU - Li, Longfei

AU - Zhao, Yong

AU - Jiang, Dongmei

AU - Zhang, Yanning

AU - Wang, Fengna

AU - Gonzalez, Isabel

AU - Valentin, Enescu

AU - Sahli, Hichem

PY - 2013

Y1 - 2013

N2 - Deep Neural Network Hidden Markov Models, or DNN-HMMs, are recently very promising acoustic models achieving good speech recognition results over Gaussian mixture model based HMMs (GMM-HMMs). In this paper, for emotion recognition from speech, we investigate DNN-HMMs with restricted Boltzmann Machine (RBM) based unsupervised pre-training, and DNN-HMMs with discriminative pre-training. Emotion recognition experiments are carried out on these two models on the eNTERFACE'05 database and Berlin database, respectively, and results are compared with those from the GMM-HMMs, the shallow-NN-HMMs with two layers, as well as the Multi-layer Perceptrons HMMs (MLP-HMMs). Experimental results show that when the numbers of the hidden layers as well hidden units are properly set, the DNN could extend the labeling ability of GMM-HMM. Among all the models, the DNN-HMMs with discriminative pre-training obtain the best results. For example, for the eNTERFACE'05 database, the recognition accuracy improves 12.22% from the DNN-HMMs with unsupervised pre-training, 11.67% from the GMM-HMMs, 10.56% from the MLP-HMMs, and even 17.22% from the shallow-NN-HMMs, respectively.

AB - Deep Neural Network Hidden Markov Models, or DNN-HMMs, are recently very promising acoustic models achieving good speech recognition results over Gaussian mixture model based HMMs (GMM-HMMs). In this paper, for emotion recognition from speech, we investigate DNN-HMMs with restricted Boltzmann Machine (RBM) based unsupervised pre-training, and DNN-HMMs with discriminative pre-training. Emotion recognition experiments are carried out on these two models on the eNTERFACE'05 database and Berlin database, respectively, and results are compared with those from the GMM-HMMs, the shallow-NN-HMMs with two layers, as well as the Multi-layer Perceptrons HMMs (MLP-HMMs). Experimental results show that when the numbers of the hidden layers as well hidden units are properly set, the DNN could extend the labeling ability of GMM-HMM. Among all the models, the DNN-HMMs with discriminative pre-training obtain the best results. For example, for the eNTERFACE'05 database, the recognition accuracy improves 12.22% from the DNN-HMMs with unsupervised pre-training, 11.67% from the GMM-HMMs, 10.56% from the MLP-HMMs, and even 17.22% from the shallow-NN-HMMs, respectively.

UR - http://www.scopus.com/inward/record.url?scp=84893307972&partnerID=8YFLogxK

U2 - 10.1109/ACII.2013.58

DO - 10.1109/ACII.2013.58

M3 - 会议稿件

AN - SCOPUS:84893307972

SN - 9780769550480

T3 - Proceedings - 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013

SP - 312

EP - 317

BT - Proceedings - 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013

T2 - 2013 5th Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013

Y2 - 2 September 2013 through 5 September 2013

ER -

Li L, Zhao Y, Jiang D, Zhang Y, Wang F, Gonzalez I 等. Hybrid Deep Neural Network - Hidden Markov Model (DNN-HMM) based speech emotion recognition. 在 Proceedings - 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013. 2013. 页码 312-317. 6681449. (Proceedings - 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013). doi: 10.1109/ACII.2013.58

Hybrid Deep Neural Network - Hidden Markov Model (DNN-HMM) based speech emotion recognition

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此