Body structure aware deep crowd counting

Siyu Huang; Xi Li; Zhongfei Zhang; Fei Wu; Shenghua Gao; Rongrong Ji; Junwei Han

doi:10.1109/TIP.2017.2740160

Body structure aware deep crowd counting

Siyu Huang, Xi Li, Zhongfei Zhang, Fei Wu, Shenghua Gao, Rongrong Ji, Junwei Han

School of Automation

Research output: Contribution to journal › Article › peer-review

112 Scopus citations

Abstract

Crowd counting is a challenging task, mainly due to the severe occlusions among dense crowds. This paper aims to take a broader view to address crowd counting from the perspective of semantic modeling. In essence, crowd counting is a task of pedestrian semantic analysis involving three key factors: pedestrians, heads, and their context structure. The information of different body parts is an important cue to help us judge whether there exists a person at a certain position. Existing methods usually perform crowd counting from the perspective of directly modeling the visual properties of either the whole body or the heads only, without explicitly capturing the composite body-part semantic structure information that is crucial for crowd counting. In our approach, we first formulate the key factors of crowd counting as semantic scene models. Then, we convert the crowd counting problem into a multi-task learning problem, such that the semantic scene models are turned into different sub-tasks. Finally, the deep convolutional neural networks are used to learn the sub-tasks in a unified scheme. Our approach encodes the semantic nature of crowd counting and provides a novel solution in terms of pedestrian semantic analysis. In experiments, our approach outperforms the state-ofthe- art methods on four benchmark crowd counting data sets. The semantic structure information is demonstrated to be an effective cue in scene of crowd counting.

Original language	English
Pages (from-to)	1049-1059
Number of pages	11
Journal	IEEE Transactions on Image Processing
Volume	27
Issue number	3
DOIs	https://doi.org/10.1109/TIP.2017.2740160
State	Published - Mar 2018

Keywords

Convolutional neural networks
Crowd counting
Pedestrian semantic analysis
Visual context structure

Access to Document

10.1109/TIP.2017.2740160

Cite this

@article{df538c8bdf1e43a68757320cf5ec109f,

title = "Body structure aware deep crowd counting",

abstract = "Crowd counting is a challenging task, mainly due to the severe occlusions among dense crowds. This paper aims to take a broader view to address crowd counting from the perspective of semantic modeling. In essence, crowd counting is a task of pedestrian semantic analysis involving three key factors: pedestrians, heads, and their context structure. The information of different body parts is an important cue to help us judge whether there exists a person at a certain position. Existing methods usually perform crowd counting from the perspective of directly modeling the visual properties of either the whole body or the heads only, without explicitly capturing the composite body-part semantic structure information that is crucial for crowd counting. In our approach, we first formulate the key factors of crowd counting as semantic scene models. Then, we convert the crowd counting problem into a multi-task learning problem, such that the semantic scene models are turned into different sub-tasks. Finally, the deep convolutional neural networks are used to learn the sub-tasks in a unified scheme. Our approach encodes the semantic nature of crowd counting and provides a novel solution in terms of pedestrian semantic analysis. In experiments, our approach outperforms the state-ofthe- art methods on four benchmark crowd counting data sets. The semantic structure information is demonstrated to be an effective cue in scene of crowd counting.",

keywords = "Convolutional neural networks, Crowd counting, Pedestrian semantic analysis, Visual context structure",

author = "Siyu Huang and Xi Li and Zhongfei Zhang and Fei Wu and Shenghua Gao and Rongrong Ji and Junwei Han",

note = "Publisher Copyright: {\textcopyright} 2017 IEEE.",

year = "2018",

month = mar,

doi = "10.1109/TIP.2017.2740160",

language = "英语",

volume = "27",

pages = "1049--1059",

journal = "IEEE Transactions on Image Processing",

issn = "1057-7149",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "3",

}

TY - JOUR

T1 - Body structure aware deep crowd counting

AU - Huang, Siyu

AU - Li, Xi

AU - Zhang, Zhongfei

AU - Wu, Fei

AU - Gao, Shenghua

AU - Ji, Rongrong

AU - Han, Junwei

PY - 2018/3

Y1 - 2018/3

N2 - Crowd counting is a challenging task, mainly due to the severe occlusions among dense crowds. This paper aims to take a broader view to address crowd counting from the perspective of semantic modeling. In essence, crowd counting is a task of pedestrian semantic analysis involving three key factors: pedestrians, heads, and their context structure. The information of different body parts is an important cue to help us judge whether there exists a person at a certain position. Existing methods usually perform crowd counting from the perspective of directly modeling the visual properties of either the whole body or the heads only, without explicitly capturing the composite body-part semantic structure information that is crucial for crowd counting. In our approach, we first formulate the key factors of crowd counting as semantic scene models. Then, we convert the crowd counting problem into a multi-task learning problem, such that the semantic scene models are turned into different sub-tasks. Finally, the deep convolutional neural networks are used to learn the sub-tasks in a unified scheme. Our approach encodes the semantic nature of crowd counting and provides a novel solution in terms of pedestrian semantic analysis. In experiments, our approach outperforms the state-ofthe- art methods on four benchmark crowd counting data sets. The semantic structure information is demonstrated to be an effective cue in scene of crowd counting.

AB - Crowd counting is a challenging task, mainly due to the severe occlusions among dense crowds. This paper aims to take a broader view to address crowd counting from the perspective of semantic modeling. In essence, crowd counting is a task of pedestrian semantic analysis involving three key factors: pedestrians, heads, and their context structure. The information of different body parts is an important cue to help us judge whether there exists a person at a certain position. Existing methods usually perform crowd counting from the perspective of directly modeling the visual properties of either the whole body or the heads only, without explicitly capturing the composite body-part semantic structure information that is crucial for crowd counting. In our approach, we first formulate the key factors of crowd counting as semantic scene models. Then, we convert the crowd counting problem into a multi-task learning problem, such that the semantic scene models are turned into different sub-tasks. Finally, the deep convolutional neural networks are used to learn the sub-tasks in a unified scheme. Our approach encodes the semantic nature of crowd counting and provides a novel solution in terms of pedestrian semantic analysis. In experiments, our approach outperforms the state-ofthe- art methods on four benchmark crowd counting data sets. The semantic structure information is demonstrated to be an effective cue in scene of crowd counting.

KW - Convolutional neural networks

KW - Crowd counting

KW - Pedestrian semantic analysis

KW - Visual context structure

UR - http://www.scopus.com/inward/record.url?scp=85028456098&partnerID=8YFLogxK

U2 - 10.1109/TIP.2017.2740160

DO - 10.1109/TIP.2017.2740160

M3 - 文章

C2 - 28816665

AN - SCOPUS:85028456098

SN - 1057-7149

VL - 27

SP - 1049

EP - 1059

JO - IEEE Transactions on Image Processing

JF - IEEE Transactions on Image Processing

IS - 3

ER -

Body structure aware deep crowd counting

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this