TY - JOUR
T1 - Click-level supervision for online action detection extended from SCOAD
AU - Zhang, Xing
AU - Mei, Yuhan
AU - Na, Ye
AU - Lin, Xia Ling
AU - Bian, Genqing
AU - Yan, Qingsen
AU - Mohi-ud-din, Ghulam
AU - Ai, Chen
AU - Li, Zhou
AU - Dong, Wei
N1 - Publisher Copyright:
© 2024 Elsevier B.V.
PY - 2025/5
Y1 - 2025/5
N2 - Data-driven fully-supervised online action detection algorithms heavily rely on manual annotations, which are challenging to obtain in real-world applications. Current research efforts aim to address this issue by introducing weakly supervised online action detection (WOAD) methods that utilize video-level annotations. However, these approaches frequently face challenges with blurred temporal boundaries, stemming from the lack of explicit temporal information. In this work, we revisit WOAD and propose an algorithm for weakly supervised online action detection using click-level annotations, which we call Single-frame Click Supervision for Online Action Detection (SCOAD). SCOAD stands out by significantly improving prediction accuracy without substantially increasing the annotation cost. This improvement is achieved through a set of well-engineered loss functions that leverage the limited temporal information provided by click labels. Additionally, we present an enhanced version of SCOAD called SCOAD++. It introduces a novel mechanism that enhances the model's ability to utilize historical information and significantly refines detail differentiation, addressing the limitations of traditional fully connected frameworks that neglect temporal variations. Furthermore, to explore the issue of accuracy variation caused by inherent randomness in click-level annotation, we have constructed a human fitness video dataset for this study. On the other hand, we also reveal the limitations of video-level labels in the field of action detection with this well-constructed dataset. We perform extensive experiments on numerous benchmark datasets and demonstrate that our approach outperforms state-of-the-art methods.
AB - Data-driven fully-supervised online action detection algorithms heavily rely on manual annotations, which are challenging to obtain in real-world applications. Current research efforts aim to address this issue by introducing weakly supervised online action detection (WOAD) methods that utilize video-level annotations. However, these approaches frequently face challenges with blurred temporal boundaries, stemming from the lack of explicit temporal information. In this work, we revisit WOAD and propose an algorithm for weakly supervised online action detection using click-level annotations, which we call Single-frame Click Supervision for Online Action Detection (SCOAD). SCOAD stands out by significantly improving prediction accuracy without substantially increasing the annotation cost. This improvement is achieved through a set of well-engineered loss functions that leverage the limited temporal information provided by click labels. Additionally, we present an enhanced version of SCOAD called SCOAD++. It introduces a novel mechanism that enhances the model's ability to utilize historical information and significantly refines detail differentiation, addressing the limitations of traditional fully connected frameworks that neglect temporal variations. Furthermore, to explore the issue of accuracy variation caused by inherent randomness in click-level annotation, we have constructed a human fitness video dataset for this study. On the other hand, we also reveal the limitations of video-level labels in the field of action detection with this well-constructed dataset. We perform extensive experiments on numerous benchmark datasets and demonstrate that our approach outperforms state-of-the-art methods.
KW - Computer vision
KW - Online action detection
KW - Video understanding
KW - Weakly supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85213212015&partnerID=8YFLogxK
U2 - 10.1016/j.future.2024.107668
DO - 10.1016/j.future.2024.107668
M3 - 文章
AN - SCOPUS:85213212015
SN - 0167-739X
VL - 166
JO - Future Generation Computer Systems
JF - Future Generation Computer Systems
M1 - 107668
ER -