Video Crowd Localization With Multifocus Gaussian Neighborhood Attention and a Large-Scale Benchmark

Haopeng Li, Lingbo Liu, Kunlin Yang, Shinan Liu, Junyu Gao, Bin Zhao, Rui Zhang, Jun Hou

科研成果: 期刊稿件文章同行评审

15 引用 (Scopus)

摘要

Video crowd localization is a crucial yet challenging task, which aims to estimate exact locations of human heads in the given crowded videos. To model spatial-temporal dependencies of human mobility, we propose a multi-focus Gaussian neighborhood attention (GNA), which can effectively exploit long-range correspondences while maintaining the spatial topological structure of the input videos. In particular, our GNA can also capture the scale variation of human heads well using the equipped multi-focus mechanism. Based on the multi-focus GNA, we develop a unified neural network called GNANet to accurately locate head centers in video clips by fully aggregating spatial-temporal information via a scene modeling module and a context cross-attention module. Moreover, to facilitate future researches in this field, we introduce a large-scale crowd video benchmark named VSCrowd (https://github.com/HopLee6/VSCrowd), which consists of 60K+ frames captured in various surveillance scenes and 2M+ head annotations. Finally, we conduct extensive experiments on three datasets including our VSCrowd, and the experiment results show that the proposed method is capable to achieve state-of-the-art performance for both video crowd localization and counting.

源语言英语
页(从-至)6032-6047
页数16
期刊IEEE Transactions on Image Processing
31
DOI
出版状态已出版 - 2022

指纹

探究 'Video Crowd Localization With Multifocus Gaussian Neighborhood Attention and a Large-Scale Benchmark' 的科研主题。它们共同构成独一无二的指纹。

引用此