SCOAD: Single-Frame Click Supervision for Online Action Detection

Na Ye; Xing Zhang; Dawei Yan; Wei Dong; Qingsen Yan

doi:10.1007/978-3-031-26316-3_14

SCOAD: Single-Frame Click Supervision for Online Action Detection

Na Ye, Xing Zhang, Dawei Yan, Wei Dong, Qingsen Yan

School of Computer Science

Xi'an University of Architecture and Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

1 Scopus citations

Abstract

Online action detection based on supervised learning requires heavy manual annotation, which is difficult to obtain and may be impractical in real applications. Weakly supervised online action detection (WOAD) can effectively mitigate the problem of substantial labeling costs by using video-level labels. In this paper, we revisit WOAD and propose a weakly supervised online action detection using click-level labels for training, named Single-frame Click Supervision for Online Action Detection (SCOAD). Comparatively, click-level labels can effectively improve prediction accuracy by carrying a small amount of temporal information without massively increase the difficulty and cost of annotation. Specifically, SCOAD includes two joint training modules, i.e., Action Instance Miner (AIM) and Online Action Detector (OAD). To provide more guidance for training network as accuracy as possible, AIM mines pseudo-action instances under the supervision of click labels. Meanwhile, we generate video similarity instances offline by the similarity between video frames and use it to perform finer granularity filtering of error instances generated by AIM. OAD is trained jointly with AIM for online action detection by the pseudo frame-level labels converted from the filtered pseudo-action instances. We conduct extensive experiments on two benchmark datasets to demonstrate that SCOAD can effectively mine and utilize the small amount of temporal information in click-level labels. Code is available at https://github.com/zstarN70/SCOAD.git.

Original language	English
Title of host publication	Computer Vision – ACCV 2022 - 16th Asian Conference on Computer Vision, Proceedings
Editors	Lei Wang, Juergen Gall, Tat-Jun Chin, Imari Sato, Rama Chellappa
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	223-238
Number of pages	16
ISBN (Print)	9783031263156
DOIs	https://doi.org/10.1007/978-3-031-26316-3_14
State	Published - 2023
Event	16th Asian Conference on Computer Vision, ACCV 2022 - Macao, China Duration: 4 Dec 2022 → 8 Dec 2022

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	13844 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	16th Asian Conference on Computer Vision, ACCV 2022
Country/Territory	China
City	Macao
Period	4/12/22 → 8/12/22

Keywords

Online action detection
Weakly supervised learning

Access to Document

10.1007/978-3-031-26316-3_14

Cite this

Ye, N., Zhang, X., Yan, D., Dong, W., & Yan, Q. (2023). SCOAD: Single-Frame Click Supervision for Online Action Detection. In L. Wang, J. Gall, T.-J. Chin, I. Sato, & R. Chellappa (Eds.), Computer Vision – ACCV 2022 - 16th Asian Conference on Computer Vision, Proceedings (pp. 223-238). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13844 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-26316-3_14

Ye, Na ; Zhang, Xing ; Yan, Dawei et al. / SCOAD : Single-Frame Click Supervision for Online Action Detection. Computer Vision – ACCV 2022 - 16th Asian Conference on Computer Vision, Proceedings. editor / Lei Wang ; Juergen Gall ; Tat-Jun Chin ; Imari Sato ; Rama Chellappa. Springer Science and Business Media Deutschland GmbH, 2023. pp. 223-238 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{d57d8f385e664356924781a72724a768,

title = "SCOAD: Single-Frame Click Supervision for Online Action Detection",

abstract = "Online action detection based on supervised learning requires heavy manual annotation, which is difficult to obtain and may be impractical in real applications. Weakly supervised online action detection (WOAD) can effectively mitigate the problem of substantial labeling costs by using video-level labels. In this paper, we revisit WOAD and propose a weakly supervised online action detection using click-level labels for training, named Single-frame Click Supervision for Online Action Detection (SCOAD). Comparatively, click-level labels can effectively improve prediction accuracy by carrying a small amount of temporal information without massively increase the difficulty and cost of annotation. Specifically, SCOAD includes two joint training modules, i.e., Action Instance Miner (AIM) and Online Action Detector (OAD). To provide more guidance for training network as accuracy as possible, AIM mines pseudo-action instances under the supervision of click labels. Meanwhile, we generate video similarity instances offline by the similarity between video frames and use it to perform finer granularity filtering of error instances generated by AIM. OAD is trained jointly with AIM for online action detection by the pseudo frame-level labels converted from the filtered pseudo-action instances. We conduct extensive experiments on two benchmark datasets to demonstrate that SCOAD can effectively mine and utilize the small amount of temporal information in click-level labels. Code is available at https://github.com/zstarN70/SCOAD.git.",

keywords = "Online action detection, Weakly supervised learning",

author = "Na Ye and Xing Zhang and Dawei Yan and Wei Dong and Qingsen Yan",

note = "Publisher Copyright: {\textcopyright} 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.; 16th Asian Conference on Computer Vision, ACCV 2022 ; Conference date: 04-12-2022 Through 08-12-2022",

year = "2023",

doi = "10.1007/978-3-031-26316-3_14",

language = "英语",

isbn = "9783031263156",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "223--238",

editor = "Lei Wang and Juergen Gall and Tat-Jun Chin and Imari Sato and Rama Chellappa",

booktitle = "Computer Vision – ACCV 2022 - 16th Asian Conference on Computer Vision, Proceedings",

}

Ye, N, Zhang, X, Yan, D, Dong, W & Yan, Q 2023, SCOAD: Single-Frame Click Supervision for Online Action Detection. in L Wang, J Gall, T-J Chin, I Sato & R Chellappa (eds), Computer Vision – ACCV 2022 - 16th Asian Conference on Computer Vision, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13844 LNCS, Springer Science and Business Media Deutschland GmbH, pp. 223-238, 16th Asian Conference on Computer Vision, ACCV 2022, Macao, China, 4/12/22. https://doi.org/10.1007/978-3-031-26316-3_14

SCOAD: Single-Frame Click Supervision for Online Action Detection. / Ye, Na; Zhang, Xing; Yan, Dawei et al.
Computer Vision – ACCV 2022 - 16th Asian Conference on Computer Vision, Proceedings. ed. / Lei Wang; Juergen Gall; Tat-Jun Chin; Imari Sato; Rama Chellappa. Springer Science and Business Media Deutschland GmbH, 2023. p. 223-238 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13844 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - SCOAD

T2 - 16th Asian Conference on Computer Vision, ACCV 2022

AU - Ye, Na

AU - Zhang, Xing

AU - Yan, Dawei

AU - Dong, Wei

AU - Yan, Qingsen

PY - 2023

Y1 - 2023

N2 - Online action detection based on supervised learning requires heavy manual annotation, which is difficult to obtain and may be impractical in real applications. Weakly supervised online action detection (WOAD) can effectively mitigate the problem of substantial labeling costs by using video-level labels. In this paper, we revisit WOAD and propose a weakly supervised online action detection using click-level labels for training, named Single-frame Click Supervision for Online Action Detection (SCOAD). Comparatively, click-level labels can effectively improve prediction accuracy by carrying a small amount of temporal information without massively increase the difficulty and cost of annotation. Specifically, SCOAD includes two joint training modules, i.e., Action Instance Miner (AIM) and Online Action Detector (OAD). To provide more guidance for training network as accuracy as possible, AIM mines pseudo-action instances under the supervision of click labels. Meanwhile, we generate video similarity instances offline by the similarity between video frames and use it to perform finer granularity filtering of error instances generated by AIM. OAD is trained jointly with AIM for online action detection by the pseudo frame-level labels converted from the filtered pseudo-action instances. We conduct extensive experiments on two benchmark datasets to demonstrate that SCOAD can effectively mine and utilize the small amount of temporal information in click-level labels. Code is available at https://github.com/zstarN70/SCOAD.git.

AB - Online action detection based on supervised learning requires heavy manual annotation, which is difficult to obtain and may be impractical in real applications. Weakly supervised online action detection (WOAD) can effectively mitigate the problem of substantial labeling costs by using video-level labels. In this paper, we revisit WOAD and propose a weakly supervised online action detection using click-level labels for training, named Single-frame Click Supervision for Online Action Detection (SCOAD). Comparatively, click-level labels can effectively improve prediction accuracy by carrying a small amount of temporal information without massively increase the difficulty and cost of annotation. Specifically, SCOAD includes two joint training modules, i.e., Action Instance Miner (AIM) and Online Action Detector (OAD). To provide more guidance for training network as accuracy as possible, AIM mines pseudo-action instances under the supervision of click labels. Meanwhile, we generate video similarity instances offline by the similarity between video frames and use it to perform finer granularity filtering of error instances generated by AIM. OAD is trained jointly with AIM for online action detection by the pseudo frame-level labels converted from the filtered pseudo-action instances. We conduct extensive experiments on two benchmark datasets to demonstrate that SCOAD can effectively mine and utilize the small amount of temporal information in click-level labels. Code is available at https://github.com/zstarN70/SCOAD.git.

KW - Online action detection

KW - Weakly supervised learning

UR - http://www.scopus.com/inward/record.url?scp=85151046481&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-26316-3_14

DO - 10.1007/978-3-031-26316-3_14

M3 - 会议稿件

AN - SCOPUS:85151046481

SN - 9783031263156

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 223

EP - 238

BT - Computer Vision – ACCV 2022 - 16th Asian Conference on Computer Vision, Proceedings

A2 - Wang, Lei

A2 - Gall, Juergen

A2 - Chin, Tat-Jun

A2 - Sato, Imari

A2 - Chellappa, Rama

PB - Springer Science and Business Media Deutschland GmbH

Y2 - 4 December 2022 through 8 December 2022

ER -

Ye N, Zhang X, Yan D, Dong W, Yan Q. SCOAD: Single-Frame Click Supervision for Online Action Detection. In Wang L, Gall J, Chin TJ, Sato I, Chellappa R, editors, Computer Vision – ACCV 2022 - 16th Asian Conference on Computer Vision, Proceedings. Springer Science and Business Media Deutschland GmbH. 2023. p. 223-238. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-26316-3_14