Multimodal continuous affect recognition based on LSTM and multiple kernel learning

Jiamei Wei; Ercheng Pei; Dongmei Jiang; Hichem Sahli; Lei Xie; Zhonghua Fu

doi:10.1109/APSIPA.2014.7041743

Multimodal continuous affect recognition based on LSTM and multiple kernel learning

Jiamei Wei, Ercheng Pei, Dongmei Jiang, Hichem Sahli, Lei Xie, Zhonghua Fu

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

16 Scopus citations

Abstract

In this paper, we propose a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) and multiple kernel learning (MKL) based multi-modal affect recognition scheme (LSTM-MKL). It takes the LSTM-RNN advantage to model the long range dependencies between successive observations, and uses the MKL power to model the non-linear correlations between the inputs and outputs. For each of the affect dimensions (arousal, valence, expectancy, and power), two LSTM-RNN models are trained, one for each modality. In the recognition phase, the audio and visual features are input to the corresponding learned LSTM models, which in turn produce initial estimates of the affect dimensions. The LSTM outputs are further input into a multi-kernel support vector regression (MK-SVR) for the final recognition. Experimental results carried out on the AVEC2012 database, show that compared to the traditional SVR-LLR (Support Vector Machine - local linear regression) or MK-SVR fusion scheme, the proposed LSTM-MKL fusion scheme obtains higher recognition results, with an correlation coefficient (COR) of 0.354, compared to a COR of 0.124 for SVR-LLR, and 0.168 for MK-SVR, respectively.

Original language	English
Title of host publication	2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)	9786163618238
DOIs	https://doi.org/10.1109/APSIPA.2014.7041743
State	Published - 12 Feb 2014
Event	2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014 - Chiang Mai, Thailand Duration: 9 Dec 2014 → 12 Dec 2014

Publication series

Name	2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014

Conference

Conference	2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014
Country/Territory	Thailand
City	Chiang Mai
Period	9/12/14 → 12/12/14

Access to Document

10.1109/APSIPA.2014.7041743

Cite this

Wei, J., Pei, E., Jiang, D., Sahli, H., Xie, L., & Fu, Z. (2014). Multimodal continuous affect recognition based on LSTM and multiple kernel learning. In 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014 Article 7041743 (2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/APSIPA.2014.7041743

Wei, Jiamei ; Pei, Ercheng ; Jiang, Dongmei et al. / Multimodal continuous affect recognition based on LSTM and multiple kernel learning. 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014. Institute of Electrical and Electronics Engineers Inc., 2014. (2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014).

@inproceedings{6ff3b518769b4bd68293baf9a161aa7b,

title = "Multimodal continuous affect recognition based on LSTM and multiple kernel learning",

abstract = "In this paper, we propose a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) and multiple kernel learning (MKL) based multi-modal affect recognition scheme (LSTM-MKL). It takes the LSTM-RNN advantage to model the long range dependencies between successive observations, and uses the MKL power to model the non-linear correlations between the inputs and outputs. For each of the affect dimensions (arousal, valence, expectancy, and power), two LSTM-RNN models are trained, one for each modality. In the recognition phase, the audio and visual features are input to the corresponding learned LSTM models, which in turn produce initial estimates of the affect dimensions. The LSTM outputs are further input into a multi-kernel support vector regression (MK-SVR) for the final recognition. Experimental results carried out on the AVEC2012 database, show that compared to the traditional SVR-LLR (Support Vector Machine - local linear regression) or MK-SVR fusion scheme, the proposed LSTM-MKL fusion scheme obtains higher recognition results, with an correlation coefficient (COR) of 0.354, compared to a COR of 0.124 for SVR-LLR, and 0.168 for MK-SVR, respectively.",

author = "Jiamei Wei and Ercheng Pei and Dongmei Jiang and Hichem Sahli and Lei Xie and Zhonghua Fu",

note = "Publisher Copyright: {\textcopyright} 2014 Asia-Pacific Signal and Information Processing Ass.; 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014 ; Conference date: 09-12-2014 Through 12-12-2014",

year = "2014",

month = feb,

day = "12",

doi = "10.1109/APSIPA.2014.7041743",

language = "英语",

series = "2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014",

}

Wei, J, Pei, E, Jiang, D, Sahli, H, Xie, L & Fu, Z 2014, Multimodal continuous affect recognition based on LSTM and multiple kernel learning. in 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014., 7041743, 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014, Institute of Electrical and Electronics Engineers Inc., 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014, Chiang Mai, Thailand, 9/12/14. https://doi.org/10.1109/APSIPA.2014.7041743

Multimodal continuous affect recognition based on LSTM and multiple kernel learning. / Wei, Jiamei; Pei, Ercheng; Jiang, Dongmei et al.
2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014. Institute of Electrical and Electronics Engineers Inc., 2014. 7041743 (2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Multimodal continuous affect recognition based on LSTM and multiple kernel learning

AU - Wei, Jiamei

AU - Pei, Ercheng

AU - Jiang, Dongmei

AU - Sahli, Hichem

AU - Xie, Lei

AU - Fu, Zhonghua

PY - 2014/2/12

Y1 - 2014/2/12

N2 - In this paper, we propose a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) and multiple kernel learning (MKL) based multi-modal affect recognition scheme (LSTM-MKL). It takes the LSTM-RNN advantage to model the long range dependencies between successive observations, and uses the MKL power to model the non-linear correlations between the inputs and outputs. For each of the affect dimensions (arousal, valence, expectancy, and power), two LSTM-RNN models are trained, one for each modality. In the recognition phase, the audio and visual features are input to the corresponding learned LSTM models, which in turn produce initial estimates of the affect dimensions. The LSTM outputs are further input into a multi-kernel support vector regression (MK-SVR) for the final recognition. Experimental results carried out on the AVEC2012 database, show that compared to the traditional SVR-LLR (Support Vector Machine - local linear regression) or MK-SVR fusion scheme, the proposed LSTM-MKL fusion scheme obtains higher recognition results, with an correlation coefficient (COR) of 0.354, compared to a COR of 0.124 for SVR-LLR, and 0.168 for MK-SVR, respectively.

AB - In this paper, we propose a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) and multiple kernel learning (MKL) based multi-modal affect recognition scheme (LSTM-MKL). It takes the LSTM-RNN advantage to model the long range dependencies between successive observations, and uses the MKL power to model the non-linear correlations between the inputs and outputs. For each of the affect dimensions (arousal, valence, expectancy, and power), two LSTM-RNN models are trained, one for each modality. In the recognition phase, the audio and visual features are input to the corresponding learned LSTM models, which in turn produce initial estimates of the affect dimensions. The LSTM outputs are further input into a multi-kernel support vector regression (MK-SVR) for the final recognition. Experimental results carried out on the AVEC2012 database, show that compared to the traditional SVR-LLR (Support Vector Machine - local linear regression) or MK-SVR fusion scheme, the proposed LSTM-MKL fusion scheme obtains higher recognition results, with an correlation coefficient (COR) of 0.354, compared to a COR of 0.124 for SVR-LLR, and 0.168 for MK-SVR, respectively.

UR - http://www.scopus.com/inward/record.url?scp=84949925169&partnerID=8YFLogxK

U2 - 10.1109/APSIPA.2014.7041743

DO - 10.1109/APSIPA.2014.7041743

M3 - 会议稿件

AN - SCOPUS:84949925169

T3 - 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014

BT - 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014

Y2 - 9 December 2014 through 12 December 2014

ER -

Wei J, Pei E, Jiang D, Sahli H, Xie L, Fu Z. Multimodal continuous affect recognition based on LSTM and multiple kernel learning. In 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014. Institute of Electrical and Electronics Engineers Inc. 2014. 7041743. (2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014). doi: 10.1109/APSIPA.2014.7041743

Multimodal continuous affect recognition based on LSTM and multiple kernel learning

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this