Improving Multimodal Emotion Recognition by Leveraging Acoustic Adaptation and Visual Alignment

Zhixian Zhao, Haifeng Chen, Xi Li, Dongmei Jiang, Lei Xie

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Multimodal Emotion Recognition (MER) aims to automatically identify and understand human emotional states by integrating information from various modalities. However, the scarcity of annotated multimodal data significantly hinders the advancement of this research field. This paper presents our solution for the MER-SEMI sub-challenge of MER 2024. First, to better adapt acoustic modality features for the MER task, we experimentally evaluate the contributions of different layers of the pre-trained speech model HuBERT in emotion recognition. Based on these observations, we perform Parameter-Efficient Fine-Tuning (PEFT) on the layers identified as most effective for emotion recognition tasks, thereby achieving optimal adaptation for emotion recognition with a minimal number of learnable parameters. Second, leveraging the strengths of the acoustic modality, we propose a feature alignment pre-training method. This approach uses large-scale unlabeled data to train a visual encoder, thereby promoting the semantic alignment of visual features within the acoustic feature space. Finally, using the adapted acoustic features, aligned visual features, and lexical features, we employ an attention mechanism for feature fusion. On the MER2024-SEMI test set, the proposed method achieves a weighted F1 score of 88.90%, ranking fourth among all participating teams, validating the effectiveness of our approach.

源语言英语
主期刊名MRAC 2024 - Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing
出版商Association for Computing Machinery, Inc
67-71
页数5
ISBN(电子版)9798400712036
DOI
出版状态已出版 - 28 10月 2024
活动2nd International Workshop on Multimodal and Responsible Affective Computing, MRAC 2024 - Melbourne, 澳大利亚
期限: 28 10月 20241 11月 2024

出版系列

姓名MRAC 2024 - Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing

会议

会议2nd International Workshop on Multimodal and Responsible Affective Computing, MRAC 2024
国家/地区澳大利亚
Melbourne
时期28/10/241/11/24

指纹

探究 'Improving Multimodal Emotion Recognition by Leveraging Acoustic Adaptation and Visual Alignment' 的科研主题。它们共同构成独一无二的指纹。

引用此