Abstract
Data-driven fully-supervised online action detection algorithms heavily rely on manual annotations, which are challenging to obtain in real-world applications. Current research efforts aim to address this issue by introducing weakly supervised online action detection (WOAD) methods that utilize video-level annotations. However, these approaches frequently face challenges with blurred temporal boundaries, stemming from the lack of explicit temporal information. In this work, we revisit WOAD and propose an algorithm for weakly supervised online action detection using click-level annotations, which we call Single-frame Click Supervision for Online Action Detection (SCOAD). SCOAD stands out by significantly improving prediction accuracy without substantially increasing the annotation cost. This improvement is achieved through a set of well-engineered loss functions that leverage the limited temporal information provided by click labels. Additionally, we present an enhanced version of SCOAD called SCOAD++. It introduces a novel mechanism that enhances the model's ability to utilize historical information and significantly refines detail differentiation, addressing the limitations of traditional fully connected frameworks that neglect temporal variations. Furthermore, to explore the issue of accuracy variation caused by inherent randomness in click-level annotation, we have constructed a human fitness video dataset for this study. On the other hand, we also reveal the limitations of video-level labels in the field of action detection with this well-constructed dataset. We perform extensive experiments on numerous benchmark datasets and demonstrate that our approach outperforms state-of-the-art methods.
| Original language | English |
|---|---|
| Article number | 107668 |
| Journal | Future Generation Computer Systems |
| Volume | 166 |
| DOIs | |
| State | Published - May 2025 |
Keywords
- Computer vision
- Online action detection
- Video understanding
- Weakly supervised learning
Fingerprint
Dive into the research topics of 'Click-level supervision for online action detection extended from SCOAD'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver