TY - JOUR
T1 - Micro-gesture Online Recognition with Graph-convolution and Multiscale Transformers for Long Sequence
AU - Guo, Xu Peng
AU - Peng, Wei
AU - Huang, Hexiang
AU - Xia, Zhaoqiang
N1 - Publisher Copyright:
© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
PY - 2023
Y1 - 2023
N2 - Micro-gesture is becoming a fundamental clue of emotion analysis and achieves more attention in this field. The studies are mainly focused on the task of micro-gesture classification which predicts the categories of micro-gesture while no works have been reported for spotting the micro-gestures. As a preliminary step for classification, the micro-gesture online recognition (spotting) that predicts the temporal location and category has achieved limited attention. In this context, we propose a novel deep network for micro-gesture online recognition, which incorporates the graph-convolution and multiscale transformer encoders. Specifically, we utilize a graph-convolution based Transformer module to extract motion features of 2D skeleton sequences, which are then processed by a feature pyramid module to obtain hierarchical multiscale features. We further employ a local Transformer module to model the similarity between micro-gesture frames, and decouple the classification and regression branches to achieve accurate location and category. These Transformers are trained in a two-stage strategy and combined to perform the spotting. Our proposed method is validated on the iMiGUE dataset and has achieved the first ranking in the task of online recognition (Track 2) of the MiGA2023 Challenge.
AB - Micro-gesture is becoming a fundamental clue of emotion analysis and achieves more attention in this field. The studies are mainly focused on the task of micro-gesture classification which predicts the categories of micro-gesture while no works have been reported for spotting the micro-gestures. As a preliminary step for classification, the micro-gesture online recognition (spotting) that predicts the temporal location and category has achieved limited attention. In this context, we propose a novel deep network for micro-gesture online recognition, which incorporates the graph-convolution and multiscale transformer encoders. Specifically, we utilize a graph-convolution based Transformer module to extract motion features of 2D skeleton sequences, which are then processed by a feature pyramid module to obtain hierarchical multiscale features. We further employ a local Transformer module to model the similarity between micro-gesture frames, and decouple the classification and regression branches to achieve accurate location and category. These Transformers are trained in a two-stage strategy and combined to perform the spotting. Our proposed method is validated on the iMiGUE dataset and has achieved the first ranking in the task of online recognition (Track 2) of the MiGA2023 Challenge.
KW - Graph convolution
KW - Micro-gesture online recognition
KW - Multiscale Transformer
UR - http://www.scopus.com/inward/record.url?scp=85177083036&partnerID=8YFLogxK
M3 - 会议文章
AN - SCOPUS:85177083036
SN - 1613-0073
VL - 3522
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
T2 - 1st IJCAI Workshop and Challenge on Micro-Gesture Analysis for Hidden Emotion Understanding, MiGA 2023
Y2 - 21 August 2023 through 22 August 2023
ER -