Self-Supervised Global Spatio-Temporal Interaction Pre-Training for Group Activity Recognition

Zexing Du, Xue Wang, Qing Wang

科研成果: 期刊稿件文章同行评审

12 引用 (Scopus)

摘要

This paper focuses on exploring distinctive spatio-temporal representation in a self-supervised manner for group activity recognition. Firstly, previous networks treat spatial- and temporal-aware information as a whole, limiting their abilities to represent complex spatio-temporal correlations for group activity. Here, we propose the Spatial and Temporal Attention Heads (STAHs) to extract spatial- and temporal-aware representations independently, which generate complementary contexts for boosting group activity understanding. Then, we propose the Global Spatio-Temporal Contrastive (GSTCo) loss to aggregate these two kinds of features. Unlike previous works focusing on the individual temporal consistency while overlooking the correlations between actors, i.e., in a local perspective, we explore the global spatial and temporal dependency. Moreover, GSTCo could effectively avoid the trivial solution faced in contrastive learning by achieving the right balance between spatial and temporal representations. Furthermore, our method imports affordable overhead during pre-training, without additional parameters or computational costs in inference, guaranteeing efficiency. By evaluating on widely-used datasets for group activity recognition, our method achieves good performance. State-of-the-art performance is achieved when applying our pre-trained backbone to existing networks. Extensive experiments verify the generalizability of our method.

源语言英语
页(从-至)5076-5088
页数13
期刊IEEE Transactions on Circuits and Systems for Video Technology
33
9
DOI
出版状态已出版 - 1 9月 2023

指纹

探究 'Self-Supervised Global Spatio-Temporal Interaction Pre-Training for Group Activity Recognition' 的科研主题。它们共同构成独一无二的指纹。

引用此