TY - JOUR
T1 - Arousal recognition using audio-visual features and FMRI-based brain response
AU - Han, Junwei
AU - Ji, Xiang
AU - Hu, Xintao
AU - Guo, Lei
AU - Liu, Tianming
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/10/1
Y1 - 2015/10/1
N2 - As the indicator of emotion intensity, arousal is a significant clue for users to find their interested content. Hence, effective techniques for video arousal recognition are highly required. In this paper, we propose a novel framework for recognizing arousal levels by integrating low-level audio-visual features derived from video content and human brain's functional activity in response to videos measured by functional magnetic resonance imaging (fMRI). At first, a set of audio-visual features which have been demonstrated to be correlated with video arousal are extracted. Then, the fMRI-derived features that convey the brain activity of comprehending videos are extracted based on a number of brain regions of interests (ROIs) identified by a universal brain reference system. Finally, these two sets of features are integrated to learn a joint representation by using a multimodal deep Boltzmann machine (DBM). The learned joint representation can be utilized as the feature for training classifiers. Due to the fact that fMRI scanning is expensive and time-consuming, our DBM fusion model has the ability to predict the joint representation of the videos without fMRI scans. The experimental results on a video benchmark demonstrated the effectiveness of our framework and the superiority of integrated features.
AB - As the indicator of emotion intensity, arousal is a significant clue for users to find their interested content. Hence, effective techniques for video arousal recognition are highly required. In this paper, we propose a novel framework for recognizing arousal levels by integrating low-level audio-visual features derived from video content and human brain's functional activity in response to videos measured by functional magnetic resonance imaging (fMRI). At first, a set of audio-visual features which have been demonstrated to be correlated with video arousal are extracted. Then, the fMRI-derived features that convey the brain activity of comprehending videos are extracted based on a number of brain regions of interests (ROIs) identified by a universal brain reference system. Finally, these two sets of features are integrated to learn a joint representation by using a multimodal deep Boltzmann machine (DBM). The learned joint representation can be utilized as the feature for training classifiers. Due to the fact that fMRI scanning is expensive and time-consuming, our DBM fusion model has the ability to predict the joint representation of the videos without fMRI scans. The experimental results on a video benchmark demonstrated the effectiveness of our framework and the superiority of integrated features.
KW - Affective computing
KW - Arousal recognition
KW - FMRI-derived features
KW - Multimodal DBM
UR - http://www.scopus.com/inward/record.url?scp=84946950523&partnerID=8YFLogxK
U2 - 10.1109/TAFFC.2015.2411280
DO - 10.1109/TAFFC.2015.2411280
M3 - 文章
AN - SCOPUS:84946950523
SN - 1949-3045
VL - 6
SP - 337
EP - 347
JO - IEEE Transactions on Affective Computing
JF - IEEE Transactions on Affective Computing
IS - 4
M1 - 7056522
ER -