TY - GEN
T1 - A Gesture-centered Dual-modal Network for Micro-Gesture Emotion Recognition
AU - Wang, Ruosi
AU - Wang, Yuhan
AU - Xia, Zhaoqiang
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - To address the issues of insufficient modality collaboration, difficulty in modeling fine-grained dynamic features, and underutilization of gesture information in current micro-behavior emotion recognition, we propose a Gesture-centered Dual-modal Network for micro-gesture emotion recognition(GDN) based on RGB-Heatmap two-stream fusion. First, the framework employs a dual-modal feature interaction mechanism to achieve deep complementarity and cooperative enhancement between RGB visual information and skeleton gestural features. Second, we introduce a Res2Net3D-based three-dimensional multi-scale feature extraction network that integrates 3D convolutions with a multiscale residual architecture, effectively enhancing the perception of spatiotemporal dynamic information across different scales. Additionally, a self-adaptive gesture attention module is designed to improve the modeling of emotional state variations within the heatmap modality. Finally, the proposed method is evaluated on the iMiGUE dataset and compared with algorithms such as TSM+LSTM and PoseC3D, further validating its effectiveness and robustness.
AB - To address the issues of insufficient modality collaboration, difficulty in modeling fine-grained dynamic features, and underutilization of gesture information in current micro-behavior emotion recognition, we propose a Gesture-centered Dual-modal Network for micro-gesture emotion recognition(GDN) based on RGB-Heatmap two-stream fusion. First, the framework employs a dual-modal feature interaction mechanism to achieve deep complementarity and cooperative enhancement between RGB visual information and skeleton gestural features. Second, we introduce a Res2Net3D-based three-dimensional multi-scale feature extraction network that integrates 3D convolutions with a multiscale residual architecture, effectively enhancing the perception of spatiotemporal dynamic information across different scales. Additionally, a self-adaptive gesture attention module is designed to improve the modeling of emotional state variations within the heatmap modality. Finally, the proposed method is evaluated on the iMiGUE dataset and compared with algorithms such as TSM+LSTM and PoseC3D, further validating its effectiveness and robustness.
KW - Adaptive Attention Mechanism
KW - MiG emotion recognition
KW - Multi-scale 3D Convolutional Network
KW - multimodal Fusion
UR - https://www.scopus.com/pages/publications/105018467235
U2 - 10.1109/ICIPMC66319.2025.11170646
DO - 10.1109/ICIPMC66319.2025.11170646
M3 - 会议稿件
AN - SCOPUS:105018467235
T3 - 2025 4th International Conference on Image Processing and Media Computing, ICIPMC 2025
SP - 109
EP - 113
BT - 2025 4th International Conference on Image Processing and Media Computing, ICIPMC 2025
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 4th International Conference on Image Processing and Media Computing, ICIPMC 2025
Y2 - 27 June 2025 through 29 June 2025
ER -