Co-Occurrence Matters: Learning Action Relation for Temporal Action Localization

Congqi Cao, Yizhe Wang, Yueran Zhang, Yue Lu, Xin Zhang, Yanning Zhang

Research output: Contribution to journalArticlepeer-review

8 Scopus citations

Abstract

Temporal action localization (TAL) is a prevailing task due to its great application potential. Existing works in this field mainly suffer from two weaknesses: (1) They often neglect the multi-label case and only focus on temporal modeling. (2) They ignore the semantic information in class labels and only use the visual information. To solve these problems, we propose a novel Co-Occurrence Relation Module (CORM) that explicitly models the co-occurrence relationship between actions. Besides the visual information, it further utilizes the semantic embeddings of class labels to model the co-occurrence relationship. The CORM works in a plug-and-play manner and can be easily incorporated with the existing sequence models. By considering both visual and semantic co-occurrence, our method achieves high multi-label relationship modeling capacity. Meanwhile, existing datasets in TAL always focus on low-semantic atomic actions. Thus we construct a challenging multi-label dataset UCF-Crime-TAL that focuses on high-semantic actions by annotating the UCF-Crime dataset at frame level and considering the semantic overlap of different events. Extensive experiments on two commonly used TAL datasets, i.e., MultiTHUMOS and TSU, and our newly proposed UCF-Crime-TAL demenstrate the effectiveness of the proposed CORM, which achieves state-of-the-art performance on these datasets.

Original languageEnglish
Pages (from-to)3327-3339
Number of pages13
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume34
Issue number5
DOIs
StatePublished - 1 May 2024

Keywords

  • Co-occurrence relationship
  • semantic information
  • temporal action localization

Fingerprint

Dive into the research topics of 'Co-Occurrence Matters: Learning Action Relation for Temporal Action Localization'. Together they form a unique fingerprint.

Cite this