TY - GEN
T1 - Multimodal continuous affect recognition based on LSTM and multiple kernel learning
AU - Wei, Jiamei
AU - Pei, Ercheng
AU - Jiang, Dongmei
AU - Sahli, Hichem
AU - Xie, Lei
AU - Fu, Zhonghua
N1 - Publisher Copyright:
© 2014 Asia-Pacific Signal and Information Processing Ass.
PY - 2014/2/12
Y1 - 2014/2/12
N2 - In this paper, we propose a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) and multiple kernel learning (MKL) based multi-modal affect recognition scheme (LSTM-MKL). It takes the LSTM-RNN advantage to model the long range dependencies between successive observations, and uses the MKL power to model the non-linear correlations between the inputs and outputs. For each of the affect dimensions (arousal, valence, expectancy, and power), two LSTM-RNN models are trained, one for each modality. In the recognition phase, the audio and visual features are input to the corresponding learned LSTM models, which in turn produce initial estimates of the affect dimensions. The LSTM outputs are further input into a multi-kernel support vector regression (MK-SVR) for the final recognition. Experimental results carried out on the AVEC2012 database, show that compared to the traditional SVR-LLR (Support Vector Machine - local linear regression) or MK-SVR fusion scheme, the proposed LSTM-MKL fusion scheme obtains higher recognition results, with an correlation coefficient (COR) of 0.354, compared to a COR of 0.124 for SVR-LLR, and 0.168 for MK-SVR, respectively.
AB - In this paper, we propose a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) and multiple kernel learning (MKL) based multi-modal affect recognition scheme (LSTM-MKL). It takes the LSTM-RNN advantage to model the long range dependencies between successive observations, and uses the MKL power to model the non-linear correlations between the inputs and outputs. For each of the affect dimensions (arousal, valence, expectancy, and power), two LSTM-RNN models are trained, one for each modality. In the recognition phase, the audio and visual features are input to the corresponding learned LSTM models, which in turn produce initial estimates of the affect dimensions. The LSTM outputs are further input into a multi-kernel support vector regression (MK-SVR) for the final recognition. Experimental results carried out on the AVEC2012 database, show that compared to the traditional SVR-LLR (Support Vector Machine - local linear regression) or MK-SVR fusion scheme, the proposed LSTM-MKL fusion scheme obtains higher recognition results, with an correlation coefficient (COR) of 0.354, compared to a COR of 0.124 for SVR-LLR, and 0.168 for MK-SVR, respectively.
UR - http://www.scopus.com/inward/record.url?scp=84949925169&partnerID=8YFLogxK
U2 - 10.1109/APSIPA.2014.7041743
DO - 10.1109/APSIPA.2014.7041743
M3 - 会议稿件
AN - SCOPUS:84949925169
T3 - 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014
BT - 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014
Y2 - 9 December 2014 through 12 December 2014
ER -