Click-level supervision for online action detection extended from SCOAD

Xing Zhang, Yuhan Mei, Ye Na, Xia Ling Lin, Genqing Bian, Qingsen Yan, Ghulam Mohi-ud-din, Chen Ai, Zhou Li, Wei Dong

Research output: Contribution to journalArticlepeer-review

Abstract

Data-driven fully-supervised online action detection algorithms heavily rely on manual annotations, which are challenging to obtain in real-world applications. Current research efforts aim to address this issue by introducing weakly supervised online action detection (WOAD) methods that utilize video-level annotations. However, these approaches frequently face challenges with blurred temporal boundaries, stemming from the lack of explicit temporal information. In this work, we revisit WOAD and propose an algorithm for weakly supervised online action detection using click-level annotations, which we call Single-frame Click Supervision for Online Action Detection (SCOAD). SCOAD stands out by significantly improving prediction accuracy without substantially increasing the annotation cost. This improvement is achieved through a set of well-engineered loss functions that leverage the limited temporal information provided by click labels. Additionally, we present an enhanced version of SCOAD called SCOAD++. It introduces a novel mechanism that enhances the model's ability to utilize historical information and significantly refines detail differentiation, addressing the limitations of traditional fully connected frameworks that neglect temporal variations. Furthermore, to explore the issue of accuracy variation caused by inherent randomness in click-level annotation, we have constructed a human fitness video dataset for this study. On the other hand, we also reveal the limitations of video-level labels in the field of action detection with this well-constructed dataset. We perform extensive experiments on numerous benchmark datasets and demonstrate that our approach outperforms state-of-the-art methods.

Original languageEnglish
Article number107668
JournalFuture Generation Computer Systems
Volume166
DOIs
StatePublished - May 2025

Keywords

  • Computer vision
  • Online action detection
  • Video understanding
  • Weakly supervised learning

Fingerprint

Dive into the research topics of 'Click-level supervision for online action detection extended from SCOAD'. Together they form a unique fingerprint.

Cite this