Real-Time Video Emotion Recognition Based on Reinforcement Learning and Domain Knowledge

Ke Zhang; Yuanqing Li; Jingyu Wang; Erik Cambria; Xuelong Li

doi:10.1109/TCSVT.2021.3072412

Real-Time Video Emotion Recognition Based on Reinforcement Learning and Domain Knowledge

Ke Zhang, Yuanqing Li, Jingyu Wang, Erik Cambria, Xuelong Li

School of Astronautics

Research output: Contribution to journal › Article › peer-review

130 Scopus citations

Abstract

Multimodal emotion recognition in conversational videos (ERC) develops rapidly in recent years. To fully extract the relative context from video clips, most studies build their models on the entire dialogues which make them lack of real-time ERC ability. Different from related researches, a novel multimodal emotion recognition model for conversational videos based on reinforcement learning and domain knowledge (ERLDK) is proposed in this paper. In ERLDK, the reinforcement learning algorithm is introduced to conduct real-time ERC with the occurrence of conversations. The collection of history utterances is composed as an emotion-pair which represents the multimodal context of the following utterance to be recognized. Dueling deep-Q-network (DDQN) based on gated recurrent unit (GRU) layers is designed to learn the correct action from the alternative emotion categories. Domain knowledge is extracted from public dataset based on the former information of emotion-pairs. The extracted domain knowledge is used to revise the results from the RL module and is transformed into other dataset to examine the rationality. The experimental results on datasets show that ERLDK achieves the state-of-the-art results on weighted average and most of the specific emotion categories.

Original language	English
Pages (from-to)	1034-1047
Number of pages	14
Journal	IEEE Transactions on Circuits and Systems for Video Technology
Volume	32
Issue number	3
DOIs	https://doi.org/10.1109/TCSVT.2021.3072412
State	Published - 1 Mar 2022

Keywords

Multimodal emotion recognition
domain knowledge
real-time video conversation
reinforcement learning

Access to Document

10.1109/TCSVT.2021.3072412

Cite this

@article{435fcdbfe2fa4def8c13284c9d4242ba,

title = "Real-Time Video Emotion Recognition Based on Reinforcement Learning and Domain Knowledge",

abstract = "Multimodal emotion recognition in conversational videos (ERC) develops rapidly in recent years. To fully extract the relative context from video clips, most studies build their models on the entire dialogues which make them lack of real-time ERC ability. Different from related researches, a novel multimodal emotion recognition model for conversational videos based on reinforcement learning and domain knowledge (ERLDK) is proposed in this paper. In ERLDK, the reinforcement learning algorithm is introduced to conduct real-time ERC with the occurrence of conversations. The collection of history utterances is composed as an emotion-pair which represents the multimodal context of the following utterance to be recognized. Dueling deep-Q-network (DDQN) based on gated recurrent unit (GRU) layers is designed to learn the correct action from the alternative emotion categories. Domain knowledge is extracted from public dataset based on the former information of emotion-pairs. The extracted domain knowledge is used to revise the results from the RL module and is transformed into other dataset to examine the rationality. The experimental results on datasets show that ERLDK achieves the state-of-the-art results on weighted average and most of the specific emotion categories.",

keywords = "Multimodal emotion recognition, domain knowledge, real-time video conversation, reinforcement learning",

author = "Ke Zhang and Yuanqing Li and Jingyu Wang and Erik Cambria and Xuelong Li",

note = "Publisher Copyright: {\textcopyright} 1991-2012 IEEE.",

year = "2022",

month = mar,

day = "1",

doi = "10.1109/TCSVT.2021.3072412",

language = "英语",

volume = "32",

pages = "1034--1047",

journal = "IEEE Transactions on Circuits and Systems for Video Technology",

issn = "1051-8215",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "3",

}

TY - JOUR

T1 - Real-Time Video Emotion Recognition Based on Reinforcement Learning and Domain Knowledge

AU - Zhang, Ke

AU - Li, Yuanqing

AU - Wang, Jingyu

AU - Cambria, Erik

AU - Li, Xuelong

PY - 2022/3/1

Y1 - 2022/3/1

N2 - Multimodal emotion recognition in conversational videos (ERC) develops rapidly in recent years. To fully extract the relative context from video clips, most studies build their models on the entire dialogues which make them lack of real-time ERC ability. Different from related researches, a novel multimodal emotion recognition model for conversational videos based on reinforcement learning and domain knowledge (ERLDK) is proposed in this paper. In ERLDK, the reinforcement learning algorithm is introduced to conduct real-time ERC with the occurrence of conversations. The collection of history utterances is composed as an emotion-pair which represents the multimodal context of the following utterance to be recognized. Dueling deep-Q-network (DDQN) based on gated recurrent unit (GRU) layers is designed to learn the correct action from the alternative emotion categories. Domain knowledge is extracted from public dataset based on the former information of emotion-pairs. The extracted domain knowledge is used to revise the results from the RL module and is transformed into other dataset to examine the rationality. The experimental results on datasets show that ERLDK achieves the state-of-the-art results on weighted average and most of the specific emotion categories.

AB - Multimodal emotion recognition in conversational videos (ERC) develops rapidly in recent years. To fully extract the relative context from video clips, most studies build their models on the entire dialogues which make them lack of real-time ERC ability. Different from related researches, a novel multimodal emotion recognition model for conversational videos based on reinforcement learning and domain knowledge (ERLDK) is proposed in this paper. In ERLDK, the reinforcement learning algorithm is introduced to conduct real-time ERC with the occurrence of conversations. The collection of history utterances is composed as an emotion-pair which represents the multimodal context of the following utterance to be recognized. Dueling deep-Q-network (DDQN) based on gated recurrent unit (GRU) layers is designed to learn the correct action from the alternative emotion categories. Domain knowledge is extracted from public dataset based on the former information of emotion-pairs. The extracted domain knowledge is used to revise the results from the RL module and is transformed into other dataset to examine the rationality. The experimental results on datasets show that ERLDK achieves the state-of-the-art results on weighted average and most of the specific emotion categories.

KW - Multimodal emotion recognition

KW - domain knowledge

KW - real-time video conversation

KW - reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=85104240861&partnerID=8YFLogxK

U2 - 10.1109/TCSVT.2021.3072412

DO - 10.1109/TCSVT.2021.3072412

M3 - 文章

AN - SCOPUS:85104240861

SN - 1051-8215

VL - 32

SP - 1034

EP - 1047

JO - IEEE Transactions on Circuits and Systems for Video Technology

JF - IEEE Transactions on Circuits and Systems for Video Technology

IS - 3

ER -

Real-Time Video Emotion Recognition Based on Reinforcement Learning and Domain Knowledge

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this