Abstract
Estimating crowd counts automatically via computer vision technology has been attracting great attention due to its numerous practical applications. The crowd counting task has many challenges, and one of the main difficulties is scale variation since the scales of people’s heads vary dramatically across various images and between different regions of the same image. In this paper, we tackle the problem by proposing a novel scale-aware counting model named FPN-LDA Net, where the Feature Pyramid Network (FPN) handles the scale variation problem by fusing multi-scale feature maps from different depth levels of the network and the Local Difference Attention (LDA) module captures the local differences between the multi-scale pyramid pooling features at a specific location and its neighborhood. To tackle the head scale variation within the same image, the dynamically learned difference scores are utilized as the weights to adaptively highlight the scale-varying head regions of the crowd which need to be focused and filter irrelevant background regions. We conduct extensive experiments on three widely adopted benchmark datasets UCF-QNRF, ShanghaiTech and UCF_CC_50. And the experimental results showed the superiority of the proposed method.
Original language | English |
---|---|
Pages (from-to) | 5165-5180 |
Number of pages | 16 |
Journal | Multimedia Tools and Applications |
Volume | 83 |
Issue number | 2 |
DOIs | |
State | Published - Jan 2024 |
Keywords
- Attention mechanism
- Convolutional neural network
- Crowding counting
- Deep learning
- FPN