DR.VIC: Decomposition and Reasoning for Video Individual Counting

Tao Han; Lei Bai; Junyu Gao; Qi Wang; Wanli Ouyang

doi:10.1109/CVPR52688.2022.00309

DR.VIC: Decomposition and Reasoning for Video Individual Counting

Tao Han, Lei Bai, Junyu Gao, Qi Wang, Wanli Ouyang

School of Artificial Intelligence, OPtics and Electronics

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

23 Scopus citations

Abstract

Pedestrian counting is a fundamental tool for under-standing pedestrian patterns and crowd flow analysis. Existing works (e.g., image-level pedestrian counting, cross-line crowd counting et al.) either only focus on the image-level counting or are constrained to the manual annotation of lines. In this work, we propose to conduct the pedes-trian counting from a new perspective - Video Individual Counting (VIC), which counts the total number of individual pedestrians in the given video (a person is only counted once). Instead of relying on the Multiple Object Tracking (MOT) techniques, we propose to solve the problem by decomposing all pedestrians into the initial pedestrians who existed in the first frame and the new pedestrians with separate identities in each following frame. Then, an end-to-end Decomposition and Reasoning Network (DRNet) is designed to predict the initial pedestrian count with the density estimation method and reason the new pedestrian's count of each frame with the differentiable optimal transport. Extensive experiments are conducted on two datasets with congested pedestrians and diverse scenes, demonstrating the effectiveness of our method over baselines with great superiority in counting the individual pedestrians. Code: https://github.com/taohan10200/DRNet.

Original language	English
Title of host publication	Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
Publisher	IEEE Computer Society
Pages	3073-3082
Number of pages	10
ISBN (Electronic)	9781665469463
DOIs	https://doi.org/10.1109/CVPR52688.2022.00309
State	Published - 2022
Event	2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 - New Orleans, United States Duration: 19 Jun 2022 → 24 Jun 2022

Publication series

Name	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Volume	2022-June
ISSN (Print)	1063-6919

Conference

Conference	2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
Country/Territory	United States
City	New Orleans
Period	19/06/22 → 24/06/22

Keywords

Scene analysis and understanding
Video analysis and understanding

Access to Document

10.1109/CVPR52688.2022.00309

Cite this

Han, T., Bai, L., Gao, J., Wang, Q., & Ouyang, W. (2022). DR.VIC: Decomposition and Reasoning for Video Individual Counting. In Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 (pp. 3073-3082). (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition; Vol. 2022-June). IEEE Computer Society. https://doi.org/10.1109/CVPR52688.2022.00309

@inproceedings{190364590e45489ea6febb41cdd5d799,

title = "DR.VIC: Decomposition and Reasoning for Video Individual Counting",

abstract = "Pedestrian counting is a fundamental tool for under-standing pedestrian patterns and crowd flow analysis. Existing works (e.g., image-level pedestrian counting, cross-line crowd counting et al.) either only focus on the image-level counting or are constrained to the manual annotation of lines. In this work, we propose to conduct the pedes-trian counting from a new perspective - Video Individual Counting (VIC), which counts the total number of individual pedestrians in the given video (a person is only counted once). Instead of relying on the Multiple Object Tracking (MOT) techniques, we propose to solve the problem by decomposing all pedestrians into the initial pedestrians who existed in the first frame and the new pedestrians with separate identities in each following frame. Then, an end-to-end Decomposition and Reasoning Network (DRNet) is designed to predict the initial pedestrian count with the density estimation method and reason the new pedestrian's count of each frame with the differentiable optimal transport. Extensive experiments are conducted on two datasets with congested pedestrians and diverse scenes, demonstrating the effectiveness of our method over baselines with great superiority in counting the individual pedestrians. Code: https://github.com/taohan10200/DRNet.",

keywords = "Scene analysis and understanding, Video analysis and understanding",

author = "Tao Han and Lei Bai and Junyu Gao and Qi Wang and Wanli Ouyang",

note = "Publisher Copyright: {\textcopyright} 2022 IEEE.; 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 ; Conference date: 19-06-2022 Through 24-06-2022",

year = "2022",

doi = "10.1109/CVPR52688.2022.00309",

language = "英语",

series = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

publisher = "IEEE Computer Society",

pages = "3073--3082",

booktitle = "Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022",

}

Han, T, Bai, L, Gao, J , Wang, Q & Ouyang, W 2022, DR.VIC: Decomposition and Reasoning for Video Individual Counting. in Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2022-June, IEEE Computer Society, pp. 3073-3082, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, United States, 19/06/22. https://doi.org/10.1109/CVPR52688.2022.00309

DR.VIC: Decomposition and Reasoning for Video Individual Counting. / Han, Tao; Bai, Lei; Gao, Junyu et al.
Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022. IEEE Computer Society, 2022. p. 3073-3082 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition; Vol. 2022-June).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - DR.VIC

T2 - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022

AU - Han, Tao

AU - Bai, Lei

AU - Gao, Junyu

AU - Wang, Qi

AU - Ouyang, Wanli

PY - 2022

Y1 - 2022

N2 - Pedestrian counting is a fundamental tool for under-standing pedestrian patterns and crowd flow analysis. Existing works (e.g., image-level pedestrian counting, cross-line crowd counting et al.) either only focus on the image-level counting or are constrained to the manual annotation of lines. In this work, we propose to conduct the pedes-trian counting from a new perspective - Video Individual Counting (VIC), which counts the total number of individual pedestrians in the given video (a person is only counted once). Instead of relying on the Multiple Object Tracking (MOT) techniques, we propose to solve the problem by decomposing all pedestrians into the initial pedestrians who existed in the first frame and the new pedestrians with separate identities in each following frame. Then, an end-to-end Decomposition and Reasoning Network (DRNet) is designed to predict the initial pedestrian count with the density estimation method and reason the new pedestrian's count of each frame with the differentiable optimal transport. Extensive experiments are conducted on two datasets with congested pedestrians and diverse scenes, demonstrating the effectiveness of our method over baselines with great superiority in counting the individual pedestrians. Code: https://github.com/taohan10200/DRNet.

AB - Pedestrian counting is a fundamental tool for under-standing pedestrian patterns and crowd flow analysis. Existing works (e.g., image-level pedestrian counting, cross-line crowd counting et al.) either only focus on the image-level counting or are constrained to the manual annotation of lines. In this work, we propose to conduct the pedes-trian counting from a new perspective - Video Individual Counting (VIC), which counts the total number of individual pedestrians in the given video (a person is only counted once). Instead of relying on the Multiple Object Tracking (MOT) techniques, we propose to solve the problem by decomposing all pedestrians into the initial pedestrians who existed in the first frame and the new pedestrians with separate identities in each following frame. Then, an end-to-end Decomposition and Reasoning Network (DRNet) is designed to predict the initial pedestrian count with the density estimation method and reason the new pedestrian's count of each frame with the differentiable optimal transport. Extensive experiments are conducted on two datasets with congested pedestrians and diverse scenes, demonstrating the effectiveness of our method over baselines with great superiority in counting the individual pedestrians. Code: https://github.com/taohan10200/DRNet.

KW - Scene analysis and understanding

KW - Video analysis and understanding

UR - http://www.scopus.com/inward/record.url?scp=85135953916&partnerID=8YFLogxK

U2 - 10.1109/CVPR52688.2022.00309

DO - 10.1109/CVPR52688.2022.00309

M3 - 会议稿件

AN - SCOPUS:85135953916

T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

SP - 3073

EP - 3082

BT - Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022

PB - IEEE Computer Society

Y2 - 19 June 2022 through 24 June 2022

ER -

Han T, Bai L, Gao J , Wang Q, Ouyang W. DR.VIC: Decomposition and Reasoning for Video Individual Counting. In Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022. IEEE Computer Society. 2022. p. 3073-3082. (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). doi: 10.1109/CVPR52688.2022.00309

DR.VIC: Decomposition and Reasoning for Video Individual Counting

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this