Exploring global context and position-aware representation for group activity recognition

Zexing Du, Qing Wang

Research output: Contribution to journalArticlepeer-review

Abstract

This paper explores the context and position information in the scene for group activity understanding. Firstly, previous group activity recognition methods strive to reason on individual features without considering the information in the scene. Besides correlations among actors, we argue that integrating the scene context simultaneously can afford us more useful and supplementary cues. Therefore, we propose a new network, termed Contextual Transformer Network (CTN), to incorporate global contextual information into individual representations. In addition, the position of individuals also plays a vital role in group activity understanding. Unlike previous methods that explore correlations among individuals semantically, we propose Clustered Position Embedding (CPE) to integrate the spatial structure of actors and produce position-aware representations. Experimental results on two widely used datasets for sports video and social activity (i.e., Volleyball and Collective Activity datasets) show that the proposed method outperforms state-of-the-art approaches. Especially, when using ResNet-18 as the backbone, our method achieves 93.6/93.9% MCA/MPCA on the Volleyball dataset and 95.4/96.3% MCA/MPCA on the Collective Activity dataset.

Original languageEnglish
Article number105181
JournalImage and Vision Computing
Volume149
DOIs
StatePublished - Sep 2024

Keywords

  • Group activity recognition
  • Position-aware representation
  • Spatio-temporal representation

Fingerprint

Dive into the research topics of 'Exploring global context and position-aware representation for group activity recognition'. Together they form a unique fingerprint.

Cite this