MOCOLNet: A Momentum Contrastive Learning Network for Multimodal Aspect-Level Sentiment Analysis

Jie Mu; Feiping Nie; Wei Wang; Jian Xu; Jing Zhang; Han Liu

doi:10.1109/TKDE.2023.3345022

MOCOLNet: A Momentum Contrastive Learning Network for Multimodal Aspect-Level Sentiment Analysis

Jie Mu, Feiping Nie, Wei Wang, Jian Xu, Jing Zhang, Han Liu

光电与智能研究院

科研成果: 期刊稿件 › 文章 › 同行评审

7 引用（Scopus）

摘要

Multimodal aspect-level sentiment analysis has attracted increasing attention in recent years. However, existing methods have two unaddressed limitations: (1) due to the lack of labelled pre-training data of dedicated sentiment analysis, the methods with a pre-training manner produce suboptimal prediction results; (2) most existing methods employ a self-attention encoder to fuse multimodal tokens, which not only ignores the alignment relationship between different modal tokens but also makes the model unable to capture the semantic links between images and texts. In this paper, we propose a momentum contrastive learning network (MOCOLNet) to overcome above limitations. First, we merge the pre-training stage with the training stage to design an end-to-end training manner which uses less labelled data dedicated to sentiment analysis to obtain better prediction results. Second, we propose a multimodal contrastive learning method to align the different modal representations before data fusing, and design a cross-modal matching strategy to provide semantic interactive information between texts and images. Moreover, we introduce an auxiliary momentum strategy to increase the robustness of model. We also analyse the effectiveness of the proposed multimodal contrastive learning method using a mutual information theory. Experiments verify that the proposed MOCOLNet is superior to other strong baselines.

源语言	英语
页（从-至）	8787-8800
页数	14
期刊	IEEE Transactions on Knowledge and Data Engineering
卷	36
期	12
DOI	https://doi.org/10.1109/TKDE.2023.3345022
出版状态	已出版 - 2024

访问文件

10.1109/TKDE.2023.3345022

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{d04864d0ba9443f9b05754c5a35ff7b9,

title = "MOCOLNet: A Momentum Contrastive Learning Network for Multimodal Aspect-Level Sentiment Analysis",

abstract = "Multimodal aspect-level sentiment analysis has attracted increasing attention in recent years. However, existing methods have two unaddressed limitations: (1) due to the lack of labelled pre-training data of dedicated sentiment analysis, the methods with a pre-training manner produce suboptimal prediction results; (2) most existing methods employ a self-attention encoder to fuse multimodal tokens, which not only ignores the alignment relationship between different modal tokens but also makes the model unable to capture the semantic links between images and texts. In this paper, we propose a momentum contrastive learning network (MOCOLNet) to overcome above limitations. First, we merge the pre-training stage with the training stage to design an end-to-end training manner which uses less labelled data dedicated to sentiment analysis to obtain better prediction results. Second, we propose a multimodal contrastive learning method to align the different modal representations before data fusing, and design a cross-modal matching strategy to provide semantic interactive information between texts and images. Moreover, we introduce an auxiliary momentum strategy to increase the robustness of model. We also analyse the effectiveness of the proposed multimodal contrastive learning method using a mutual information theory. Experiments verify that the proposed MOCOLNet is superior to other strong baselines.",

keywords = "Aspect-level sentiment analysis, contrastive learning, multimodal representation learning",

author = "Jie Mu and Feiping Nie and Wei Wang and Jian Xu and Jing Zhang and Han Liu",

note = "Publisher Copyright: {\textcopyright} 1989-2012 IEEE.",

year = "2024",

doi = "10.1109/TKDE.2023.3345022",

language = "英语",

volume = "36",

pages = "8787--8800",

journal = "IEEE Transactions on Knowledge and Data Engineering",

issn = "1041-4347",

publisher = "IEEE Computer Society",

number = "12",

}

TY - JOUR

T1 - MOCOLNet

T2 - A Momentum Contrastive Learning Network for Multimodal Aspect-Level Sentiment Analysis

AU - Mu, Jie

AU - Nie, Feiping

AU - Wang, Wei

AU - Xu, Jian

AU - Zhang, Jing

AU - Liu, Han

PY - 2024

Y1 - 2024

N2 - Multimodal aspect-level sentiment analysis has attracted increasing attention in recent years. However, existing methods have two unaddressed limitations: (1) due to the lack of labelled pre-training data of dedicated sentiment analysis, the methods with a pre-training manner produce suboptimal prediction results; (2) most existing methods employ a self-attention encoder to fuse multimodal tokens, which not only ignores the alignment relationship between different modal tokens but also makes the model unable to capture the semantic links between images and texts. In this paper, we propose a momentum contrastive learning network (MOCOLNet) to overcome above limitations. First, we merge the pre-training stage with the training stage to design an end-to-end training manner which uses less labelled data dedicated to sentiment analysis to obtain better prediction results. Second, we propose a multimodal contrastive learning method to align the different modal representations before data fusing, and design a cross-modal matching strategy to provide semantic interactive information between texts and images. Moreover, we introduce an auxiliary momentum strategy to increase the robustness of model. We also analyse the effectiveness of the proposed multimodal contrastive learning method using a mutual information theory. Experiments verify that the proposed MOCOLNet is superior to other strong baselines.

AB - Multimodal aspect-level sentiment analysis has attracted increasing attention in recent years. However, existing methods have two unaddressed limitations: (1) due to the lack of labelled pre-training data of dedicated sentiment analysis, the methods with a pre-training manner produce suboptimal prediction results; (2) most existing methods employ a self-attention encoder to fuse multimodal tokens, which not only ignores the alignment relationship between different modal tokens but also makes the model unable to capture the semantic links between images and texts. In this paper, we propose a momentum contrastive learning network (MOCOLNet) to overcome above limitations. First, we merge the pre-training stage with the training stage to design an end-to-end training manner which uses less labelled data dedicated to sentiment analysis to obtain better prediction results. Second, we propose a multimodal contrastive learning method to align the different modal representations before data fusing, and design a cross-modal matching strategy to provide semantic interactive information between texts and images. Moreover, we introduce an auxiliary momentum strategy to increase the robustness of model. We also analyse the effectiveness of the proposed multimodal contrastive learning method using a mutual information theory. Experiments verify that the proposed MOCOLNet is superior to other strong baselines.

KW - Aspect-level sentiment analysis

KW - contrastive learning

KW - multimodal representation learning

UR - http://www.scopus.com/inward/record.url?scp=85182360347&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2023.3345022

DO - 10.1109/TKDE.2023.3345022

M3 - 文章

AN - SCOPUS:85182360347

SN - 1041-4347

VL - 36

SP - 8787

EP - 8800

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

IS - 12

ER -

MOCOLNet: A Momentum Contrastive Learning Network for Multimodal Aspect-Level Sentiment Analysis

摘要

访问文件

其它文件与链接

指纹

引用此