TY - JOUR
T1 - Micro-gesture Online Recognition with Dual-stream Multi-scale Transformer in Long Videos
AU - Wang, Yuhan
AU - Linghu, Ke Rui
AU - Huang, Hexiang
AU - Xia, Zhaoqiang
N1 - Publisher Copyright:
© 2024 Copyright for this paper by its authors.
PY - 2024
Y1 - 2024
N2 - Micro-gestures are increasingly recognized as a key indicator in the field of emotion analysis and have garnered growing interest within the field. The majority of research efforts have been directed towards the classification of micro-gestures, which entails predicting their categories. However, comparatively fewer studies have been dedicated to the detection of micro-gestures. Micro-gesture online recognition (spotting), which involves predicting both the temporal position and the category, is a preliminary step for classification but has received limited attention. In this context, we construct a deep network with dual-stream input for micro-gesture online recognition. Specifically, we utilize a sequential action recognition model to extract motion features from RGB and skeleton sequences separately, which are then processed by the multi-scale Transformer encoder as detection model. The proposed network are trained in a two-stage strategy and combined to perform the temporal spotting. Our proposed method is validated on the SMG dataset and has achieved the first ranking in the task of online recognition from the MiGA2024 Challenge Track 2.
AB - Micro-gestures are increasingly recognized as a key indicator in the field of emotion analysis and have garnered growing interest within the field. The majority of research efforts have been directed towards the classification of micro-gestures, which entails predicting their categories. However, comparatively fewer studies have been dedicated to the detection of micro-gestures. Micro-gesture online recognition (spotting), which involves predicting both the temporal position and the category, is a preliminary step for classification but has received limited attention. In this context, we construct a deep network with dual-stream input for micro-gesture online recognition. Specifically, we utilize a sequential action recognition model to extract motion features from RGB and skeleton sequences separately, which are then processed by the multi-scale Transformer encoder as detection model. The proposed network are trained in a two-stage strategy and combined to perform the temporal spotting. Our proposed method is validated on the SMG dataset and has achieved the first ranking in the task of online recognition from the MiGA2024 Challenge Track 2.
KW - Dual-stream network
KW - Micro-gesture online recognition
KW - Multi-scale Transformer
UR - http://www.scopus.com/inward/record.url?scp=85212446106&partnerID=8YFLogxK
M3 - 会议文章
AN - SCOPUS:85212446106
SN - 1613-0073
VL - 3848
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
T2 - 2024 IJCAI Workshop and Challenge on Micro-Gesture Analysis for Hidden Emotion Understanding, MiGA 2024
Y2 - 4 August 2024
ER -