Exploring global context and position-aware representation for group activity recognition

Zexing Du, Qing Wang

科研成果: 期刊稿件文章同行评审

摘要

This paper explores the context and position information in the scene for group activity understanding. Firstly, previous group activity recognition methods strive to reason on individual features without considering the information in the scene. Besides correlations among actors, we argue that integrating the scene context simultaneously can afford us more useful and supplementary cues. Therefore, we propose a new network, termed Contextual Transformer Network (CTN), to incorporate global contextual information into individual representations. In addition, the position of individuals also plays a vital role in group activity understanding. Unlike previous methods that explore correlations among individuals semantically, we propose Clustered Position Embedding (CPE) to integrate the spatial structure of actors and produce position-aware representations. Experimental results on two widely used datasets for sports video and social activity (i.e., Volleyball and Collective Activity datasets) show that the proposed method outperforms state-of-the-art approaches. Especially, when using ResNet-18 as the backbone, our method achieves 93.6/93.9% MCA/MPCA on the Volleyball dataset and 95.4/96.3% MCA/MPCA on the Collective Activity dataset.

源语言英语
文章编号105181
期刊Image and Vision Computing
149
DOI
出版状态已出版 - 9月 2024

指纹

探究 'Exploring global context and position-aware representation for group activity recognition' 的科研主题。它们共同构成独一无二的指纹。

引用此