Multimodal continuous affect recognition based on LSTM and multiple kernel learning

Jiamei Wei; Ercheng Pei; Dongmei Jiang; Hichem Sahli; Lei Xie; Zhonghua Fu

doi:10.1109/APSIPA.2014.7041743

Multimodal continuous affect recognition based on LSTM and multiple kernel learning

Jiamei Wei, Ercheng Pei, Dongmei Jiang, Hichem Sahli, Lei Xie, Zhonghua Fu

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

16 引用（Scopus）

摘要

In this paper, we propose a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) and multiple kernel learning (MKL) based multi-modal affect recognition scheme (LSTM-MKL). It takes the LSTM-RNN advantage to model the long range dependencies between successive observations, and uses the MKL power to model the non-linear correlations between the inputs and outputs. For each of the affect dimensions (arousal, valence, expectancy, and power), two LSTM-RNN models are trained, one for each modality. In the recognition phase, the audio and visual features are input to the corresponding learned LSTM models, which in turn produce initial estimates of the affect dimensions. The LSTM outputs are further input into a multi-kernel support vector regression (MK-SVR) for the final recognition. Experimental results carried out on the AVEC2012 database, show that compared to the traditional SVR-LLR (Support Vector Machine - local linear regression) or MK-SVR fusion scheme, the proposed LSTM-MKL fusion scheme obtains higher recognition results, with an correlation coefficient (COR) of 0.354, compared to a COR of 0.124 for SVR-LLR, and 0.168 for MK-SVR, respectively.

源语言	英语
主期刊名	2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014
出版商	Institute of Electrical and Electronics Engineers Inc.
ISBN（电子版）	9786163618238
DOI	https://doi.org/10.1109/APSIPA.2014.7041743
出版状态	已出版 - 12 2月 2014
活动	2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014 - Chiang Mai, 泰国期限: 9 12月 2014 → 12 12月 2014

出版系列

姓名	2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014

会议

会议	2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014
国家/地区	泰国
市	Chiang Mai
时期	9/12/14 → 12/12/14

访问文件

10.1109/APSIPA.2014.7041743

其它文件与链接

链接到 Scopus 的出版物

引用此

Wei, J., Pei, E., Jiang, D., Sahli, H., Xie, L., & Fu, Z. (2014). Multimodal continuous affect recognition based on LSTM and multiple kernel learning. 在 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014 文章 7041743 (2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/APSIPA.2014.7041743

Wei, Jiamei ; Pei, Ercheng ; Jiang, Dongmei 等. / Multimodal continuous affect recognition based on LSTM and multiple kernel learning. 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014. Institute of Electrical and Electronics Engineers Inc., 2014. (2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014).

@inproceedings{6ff3b518769b4bd68293baf9a161aa7b,

title = "Multimodal continuous affect recognition based on LSTM and multiple kernel learning",

abstract = "In this paper, we propose a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) and multiple kernel learning (MKL) based multi-modal affect recognition scheme (LSTM-MKL). It takes the LSTM-RNN advantage to model the long range dependencies between successive observations, and uses the MKL power to model the non-linear correlations between the inputs and outputs. For each of the affect dimensions (arousal, valence, expectancy, and power), two LSTM-RNN models are trained, one for each modality. In the recognition phase, the audio and visual features are input to the corresponding learned LSTM models, which in turn produce initial estimates of the affect dimensions. The LSTM outputs are further input into a multi-kernel support vector regression (MK-SVR) for the final recognition. Experimental results carried out on the AVEC2012 database, show that compared to the traditional SVR-LLR (Support Vector Machine - local linear regression) or MK-SVR fusion scheme, the proposed LSTM-MKL fusion scheme obtains higher recognition results, with an correlation coefficient (COR) of 0.354, compared to a COR of 0.124 for SVR-LLR, and 0.168 for MK-SVR, respectively.",

author = "Jiamei Wei and Ercheng Pei and Dongmei Jiang and Hichem Sahli and Lei Xie and Zhonghua Fu",

note = "Publisher Copyright: {\textcopyright} 2014 Asia-Pacific Signal and Information Processing Ass.; 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014 ; Conference date: 09-12-2014 Through 12-12-2014",

year = "2014",

month = feb,

day = "12",

doi = "10.1109/APSIPA.2014.7041743",

language = "英语",

series = "2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014",

}

Wei, J, Pei, E, Jiang, D, Sahli, H, Xie, L & Fu, Z 2014, Multimodal continuous affect recognition based on LSTM and multiple kernel learning. 在 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014., 7041743, 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014, Institute of Electrical and Electronics Engineers Inc., 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014, Chiang Mai, 泰国, 9/12/14. https://doi.org/10.1109/APSIPA.2014.7041743

Multimodal continuous affect recognition based on LSTM and multiple kernel learning. / Wei, Jiamei; Pei, Ercheng; Jiang, Dongmei 等.
2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014. Institute of Electrical and Electronics Engineers Inc., 2014. 7041743 (2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Multimodal continuous affect recognition based on LSTM and multiple kernel learning

AU - Wei, Jiamei

AU - Pei, Ercheng

AU - Jiang, Dongmei

AU - Sahli, Hichem

AU - Xie, Lei

AU - Fu, Zhonghua

PY - 2014/2/12

Y1 - 2014/2/12

N2 - In this paper, we propose a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) and multiple kernel learning (MKL) based multi-modal affect recognition scheme (LSTM-MKL). It takes the LSTM-RNN advantage to model the long range dependencies between successive observations, and uses the MKL power to model the non-linear correlations between the inputs and outputs. For each of the affect dimensions (arousal, valence, expectancy, and power), two LSTM-RNN models are trained, one for each modality. In the recognition phase, the audio and visual features are input to the corresponding learned LSTM models, which in turn produce initial estimates of the affect dimensions. The LSTM outputs are further input into a multi-kernel support vector regression (MK-SVR) for the final recognition. Experimental results carried out on the AVEC2012 database, show that compared to the traditional SVR-LLR (Support Vector Machine - local linear regression) or MK-SVR fusion scheme, the proposed LSTM-MKL fusion scheme obtains higher recognition results, with an correlation coefficient (COR) of 0.354, compared to a COR of 0.124 for SVR-LLR, and 0.168 for MK-SVR, respectively.

AB - In this paper, we propose a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) and multiple kernel learning (MKL) based multi-modal affect recognition scheme (LSTM-MKL). It takes the LSTM-RNN advantage to model the long range dependencies between successive observations, and uses the MKL power to model the non-linear correlations between the inputs and outputs. For each of the affect dimensions (arousal, valence, expectancy, and power), two LSTM-RNN models are trained, one for each modality. In the recognition phase, the audio and visual features are input to the corresponding learned LSTM models, which in turn produce initial estimates of the affect dimensions. The LSTM outputs are further input into a multi-kernel support vector regression (MK-SVR) for the final recognition. Experimental results carried out on the AVEC2012 database, show that compared to the traditional SVR-LLR (Support Vector Machine - local linear regression) or MK-SVR fusion scheme, the proposed LSTM-MKL fusion scheme obtains higher recognition results, with an correlation coefficient (COR) of 0.354, compared to a COR of 0.124 for SVR-LLR, and 0.168 for MK-SVR, respectively.

UR - http://www.scopus.com/inward/record.url?scp=84949925169&partnerID=8YFLogxK

U2 - 10.1109/APSIPA.2014.7041743

DO - 10.1109/APSIPA.2014.7041743

M3 - 会议稿件

AN - SCOPUS:84949925169

T3 - 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014

BT - 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014

Y2 - 9 December 2014 through 12 December 2014

ER -

Wei J, Pei E, Jiang D, Sahli H, Xie L, Fu Z. Multimodal continuous affect recognition based on LSTM and multiple kernel learning. 在 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014. Institute of Electrical and Electronics Engineers Inc. 2014. 7041743. (2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014). doi: 10.1109/APSIPA.2014.7041743

Multimodal continuous affect recognition based on LSTM and multiple kernel learning

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此