Concept-guided multi-level attention network for image emotion recognition

Hansen Yang; Yangyu Fan; Guoyun Lv; Shiya Liu; Zhe Guo

doi:10.1007/s11760-024-03074-8

Concept-guided multi-level attention network for image emotion recognition

Hansen Yang, Yangyu Fan, Guoyun Lv, Shiya Liu, Zhe Guo

电子信息学院

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

Image emotion recognition aims to predict people’s emotional response toward visual stimuli. Recently, emotional region discovery has become hot topic in this field because it brings significant improvement for the task. Existing studies mainly discover emotional region by sophisticated analyzing from object aspect, which is less discriminative for emotion. In this paper, we propose a Concept-guided Multi-level Attention Network (CMANet) that makes full use of attribute aspect concept to enhance image emotion recognition. To leverage multiple concepts to guide the mining of emotional region, CMANet is designed as multi-level architecture, in which attended semantic feature is firstly calculated under the guidance of feature from holistic branch. Subsequently, with the obtained attended semantic feature, emotional region of feature map in local branch can be attended on. And then, an adaptive fusion method is proposed to achieve complementation of both attended visual and semantic features. Notably, for the emotion categories that are easily to be confused, a novel variable weight cross-entropy loss which enables the model to focus on the hard samples, is proposed to improve the performance of the task. Experiments on several affective image datasets prove that the proposed method is effective and superior to the state-of-the-art methods.

源语言	英语
页（从-至）	4313-4326
页数	14
期刊	Signal, Image and Video Processing
卷	18
期	5
DOI	https://doi.org/10.1007/s11760-024-03074-8
出版状态	已出版 - 7月 2024

访问文件

10.1007/s11760-024-03074-8

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{e773076dde3b4e7fafb92ed4979f22c8,

title = "Concept-guided multi-level attention network for image emotion recognition",

abstract = "Image emotion recognition aims to predict people{\textquoteright}s emotional response toward visual stimuli. Recently, emotional region discovery has become hot topic in this field because it brings significant improvement for the task. Existing studies mainly discover emotional region by sophisticated analyzing from object aspect, which is less discriminative for emotion. In this paper, we propose a Concept-guided Multi-level Attention Network (CMANet) that makes full use of attribute aspect concept to enhance image emotion recognition. To leverage multiple concepts to guide the mining of emotional region, CMANet is designed as multi-level architecture, in which attended semantic feature is firstly calculated under the guidance of feature from holistic branch. Subsequently, with the obtained attended semantic feature, emotional region of feature map in local branch can be attended on. And then, an adaptive fusion method is proposed to achieve complementation of both attended visual and semantic features. Notably, for the emotion categories that are easily to be confused, a novel variable weight cross-entropy loss which enables the model to focus on the hard samples, is proposed to improve the performance of the task. Experiments on several affective image datasets prove that the proposed method is effective and superior to the state-of-the-art methods.",

keywords = "Adaptive fusion, Image emotion recognition, Semantic attention, Variable weight cross-entropy loss, Visual attention",

author = "Hansen Yang and Yangyu Fan and Guoyun Lv and Shiya Liu and Zhe Guo",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024.",

year = "2024",

month = jul,

doi = "10.1007/s11760-024-03074-8",

language = "英语",

volume = "18",

pages = "4313--4326",

journal = "Signal, Image and Video Processing",

issn = "1863-1703",

publisher = "Springer London",

number = "5",

}

TY - JOUR

T1 - Concept-guided multi-level attention network for image emotion recognition

AU - Yang, Hansen

AU - Fan, Yangyu

AU - Lv, Guoyun

AU - Liu, Shiya

AU - Guo, Zhe

N1 - Publisher Copyright: © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024.

PY - 2024/7

Y1 - 2024/7

N2 - Image emotion recognition aims to predict people’s emotional response toward visual stimuli. Recently, emotional region discovery has become hot topic in this field because it brings significant improvement for the task. Existing studies mainly discover emotional region by sophisticated analyzing from object aspect, which is less discriminative for emotion. In this paper, we propose a Concept-guided Multi-level Attention Network (CMANet) that makes full use of attribute aspect concept to enhance image emotion recognition. To leverage multiple concepts to guide the mining of emotional region, CMANet is designed as multi-level architecture, in which attended semantic feature is firstly calculated under the guidance of feature from holistic branch. Subsequently, with the obtained attended semantic feature, emotional region of feature map in local branch can be attended on. And then, an adaptive fusion method is proposed to achieve complementation of both attended visual and semantic features. Notably, for the emotion categories that are easily to be confused, a novel variable weight cross-entropy loss which enables the model to focus on the hard samples, is proposed to improve the performance of the task. Experiments on several affective image datasets prove that the proposed method is effective and superior to the state-of-the-art methods.

AB - Image emotion recognition aims to predict people’s emotional response toward visual stimuli. Recently, emotional region discovery has become hot topic in this field because it brings significant improvement for the task. Existing studies mainly discover emotional region by sophisticated analyzing from object aspect, which is less discriminative for emotion. In this paper, we propose a Concept-guided Multi-level Attention Network (CMANet) that makes full use of attribute aspect concept to enhance image emotion recognition. To leverage multiple concepts to guide the mining of emotional region, CMANet is designed as multi-level architecture, in which attended semantic feature is firstly calculated under the guidance of feature from holistic branch. Subsequently, with the obtained attended semantic feature, emotional region of feature map in local branch can be attended on. And then, an adaptive fusion method is proposed to achieve complementation of both attended visual and semantic features. Notably, for the emotion categories that are easily to be confused, a novel variable weight cross-entropy loss which enables the model to focus on the hard samples, is proposed to improve the performance of the task. Experiments on several affective image datasets prove that the proposed method is effective and superior to the state-of-the-art methods.

KW - Adaptive fusion

KW - Image emotion recognition

KW - Semantic attention

KW - Variable weight cross-entropy loss

KW - Visual attention

UR - http://www.scopus.com/inward/record.url?scp=85187675956&partnerID=8YFLogxK

U2 - 10.1007/s11760-024-03074-8

DO - 10.1007/s11760-024-03074-8

M3 - 文章

AN - SCOPUS:85187675956

SN - 1863-1703

VL - 18

SP - 4313

EP - 4326

JO - Signal, Image and Video Processing

JF - Signal, Image and Video Processing

IS - 5

ER -

Concept-guided multi-level attention network for image emotion recognition

摘要

访问文件

其它文件与链接

指纹

引用此