Finding Nonrigid Tiny Person With Densely Cropped and Local Attention Object Detector Networks in Low-Altitude Aerial Images

Xiangqing Zhang; Yan Feng; Shun Zhang; Nan Wang; Shaohui Mei

doi:10.1109/JSTARS.2022.3175498

Finding Nonrigid Tiny Person With Densely Cropped and Local Attention Object Detector Networks in Low-Altitude Aerial Images

Xiangqing Zhang, Yan Feng, Shun Zhang, Nan Wang, Shaohui Mei

School of Electronics and Information

Research output: Contribution to journal › Article › peer-review

30 Scopus citations

Abstract

Finding tiny persons under the drone vision was, is, and remains to be an integral and challenging task. Unmanned aerial vehicles (UAVs) with high-speed, low-altitude, and multi-perspective flight bring about violently various scales of objects, which burdens the optimization of models. Moreover, the detection performance of densely and faintly discernible person characteristics is far less than that of large objects in high-resolution aerial images. In this article, we introduce the image cropping strategy and attention mechanism based on YOLOv5 to address small person detection in the optimized VisDrone2019 dataset. Specifically, we propose a Densely Cropped and Local Attention of object detector Network (DCLANet), which is inspired by the observation that less area occupied by small objects should be fully focused and relatively magnified in the original image. DCLANet-assembled Density Map-Guided Object Detection (DMNet) in aerial images and You Only Look Twice (YOLT): Rapid Multiscale Object Detection In Satellite Imagery to crop images upon training and testing stage, meanwhile, added bottleneck attention mechanism to YOLOv5 baseline framework, which more focus on person objects other than irrelevant categories. To achieve further improvement of DCLANet, we also provide bags of useful strategies: data augmentation, label fusion, category filtering, and hyperparameter evolution. Extensive experiments on the VisDrone2019 show that DCLANet achieves state-of-the-art performanc; the detection result of person category A P^{text{val }}@0.5 is 50.04% with test-dev subset, which is substantially better than the previous SOTA method (DPNetV3) by 12.01%. In addition, on our optimized VisDrone2019 dataset, A P^{text{val }}@0.5 and A P^{text{test }}@0.5 obtained 74.95% and 62.18%, respectively. Compared to YOLOv5, DCLANet improves 3.8% or so, which is encouraging and competitive.

Original language	English
Pages (from-to)	4371-4385
Number of pages	15
Journal	IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Volume	15
DOIs	https://doi.org/10.1109/JSTARS.2022.3175498
State	Published - 2022

Keywords

Bottleneck attention mechanism (BAM)
densely cropped
small object detection
VisDrone2019 datasets
YOLOv5

Access to Document

10.1109/JSTARS.2022.3175498

Cite this

@article{4e508e0d108d4857bb5bdc913dee7e07,

title = "Finding Nonrigid Tiny Person With Densely Cropped and Local Attention Object Detector Networks in Low-Altitude Aerial Images",

abstract = "Finding tiny persons under the drone vision was, is, and remains to be an integral and challenging task. Unmanned aerial vehicles (UAVs) with high-speed, low-altitude, and multi-perspective flight bring about violently various scales of objects, which burdens the optimization of models. Moreover, the detection performance of densely and faintly discernible person characteristics is far less than that of large objects in high-resolution aerial images. In this article, we introduce the image cropping strategy and attention mechanism based on YOLOv5 to address small person detection in the optimized VisDrone2019 dataset. Specifically, we propose a Densely Cropped and Local Attention of object detector Network (DCLANet), which is inspired by the observation that less area occupied by small objects should be fully focused and relatively magnified in the original image. DCLANet-assembled Density Map-Guided Object Detection (DMNet) in aerial images and You Only Look Twice (YOLT): Rapid Multiscale Object Detection In Satellite Imagery to crop images upon training and testing stage, meanwhile, added bottleneck attention mechanism to YOLOv5 baseline framework, which more focus on person objects other than irrelevant categories. To achieve further improvement of DCLANet, we also provide bags of useful strategies: data augmentation, label fusion, category filtering, and hyperparameter evolution. Extensive experiments on the VisDrone2019 show that DCLANet achieves state-of-the-art performanc; the detection result of person category A P^{text{val }}@0.5 is 50.04% with test-dev subset, which is substantially better than the previous SOTA method (DPNetV3) by 12.01%. In addition, on our optimized VisDrone2019 dataset, A P^{text{val }}@0.5 and A P^{text{test }}@0.5 obtained 74.95% and 62.18%, respectively. Compared to YOLOv5, DCLANet improves 3.8% or so, which is encouraging and competitive.",

keywords = "Bottleneck attention mechanism (BAM), densely cropped, small object detection, VisDrone2019 datasets, YOLOv5",

author = "Xiangqing Zhang and Yan Feng and Shun Zhang and Nan Wang and Shaohui Mei",

note = "Publisher Copyright: {\textcopyright} 2008-2012 IEEE.",

year = "2022",

doi = "10.1109/JSTARS.2022.3175498",

language = "英语",

volume = "15",

pages = "4371--4385",

journal = "IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing",

issn = "1939-1404",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Finding Nonrigid Tiny Person With Densely Cropped and Local Attention Object Detector Networks in Low-Altitude Aerial Images

AU - Zhang, Xiangqing

AU - Feng, Yan

AU - Zhang, Shun

AU - Wang, Nan

AU - Mei, Shaohui

PY - 2022

Y1 - 2022

N2 - Finding tiny persons under the drone vision was, is, and remains to be an integral and challenging task. Unmanned aerial vehicles (UAVs) with high-speed, low-altitude, and multi-perspective flight bring about violently various scales of objects, which burdens the optimization of models. Moreover, the detection performance of densely and faintly discernible person characteristics is far less than that of large objects in high-resolution aerial images. In this article, we introduce the image cropping strategy and attention mechanism based on YOLOv5 to address small person detection in the optimized VisDrone2019 dataset. Specifically, we propose a Densely Cropped and Local Attention of object detector Network (DCLANet), which is inspired by the observation that less area occupied by small objects should be fully focused and relatively magnified in the original image. DCLANet-assembled Density Map-Guided Object Detection (DMNet) in aerial images and You Only Look Twice (YOLT): Rapid Multiscale Object Detection In Satellite Imagery to crop images upon training and testing stage, meanwhile, added bottleneck attention mechanism to YOLOv5 baseline framework, which more focus on person objects other than irrelevant categories. To achieve further improvement of DCLANet, we also provide bags of useful strategies: data augmentation, label fusion, category filtering, and hyperparameter evolution. Extensive experiments on the VisDrone2019 show that DCLANet achieves state-of-the-art performanc; the detection result of person category A P^{text{val }}@0.5 is 50.04% with test-dev subset, which is substantially better than the previous SOTA method (DPNetV3) by 12.01%. In addition, on our optimized VisDrone2019 dataset, A P^{text{val }}@0.5 and A P^{text{test }}@0.5 obtained 74.95% and 62.18%, respectively. Compared to YOLOv5, DCLANet improves 3.8% or so, which is encouraging and competitive.

AB - Finding tiny persons under the drone vision was, is, and remains to be an integral and challenging task. Unmanned aerial vehicles (UAVs) with high-speed, low-altitude, and multi-perspective flight bring about violently various scales of objects, which burdens the optimization of models. Moreover, the detection performance of densely and faintly discernible person characteristics is far less than that of large objects in high-resolution aerial images. In this article, we introduce the image cropping strategy and attention mechanism based on YOLOv5 to address small person detection in the optimized VisDrone2019 dataset. Specifically, we propose a Densely Cropped and Local Attention of object detector Network (DCLANet), which is inspired by the observation that less area occupied by small objects should be fully focused and relatively magnified in the original image. DCLANet-assembled Density Map-Guided Object Detection (DMNet) in aerial images and You Only Look Twice (YOLT): Rapid Multiscale Object Detection In Satellite Imagery to crop images upon training and testing stage, meanwhile, added bottleneck attention mechanism to YOLOv5 baseline framework, which more focus on person objects other than irrelevant categories. To achieve further improvement of DCLANet, we also provide bags of useful strategies: data augmentation, label fusion, category filtering, and hyperparameter evolution. Extensive experiments on the VisDrone2019 show that DCLANet achieves state-of-the-art performanc; the detection result of person category A P^{text{val }}@0.5 is 50.04% with test-dev subset, which is substantially better than the previous SOTA method (DPNetV3) by 12.01%. In addition, on our optimized VisDrone2019 dataset, A P^{text{val }}@0.5 and A P^{text{test }}@0.5 obtained 74.95% and 62.18%, respectively. Compared to YOLOv5, DCLANet improves 3.8% or so, which is encouraging and competitive.

KW - Bottleneck attention mechanism (BAM)

KW - densely cropped

KW - small object detection

KW - VisDrone2019 datasets

KW - YOLOv5

UR - http://www.scopus.com/inward/record.url?scp=85130507368&partnerID=8YFLogxK

U2 - 10.1109/JSTARS.2022.3175498

DO - 10.1109/JSTARS.2022.3175498

M3 - 文章

AN - SCOPUS:85130507368

SN - 1939-1404

VL - 15

SP - 4371

EP - 4385

JO - IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

JF - IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

ER -

Finding Nonrigid Tiny Person With Densely Cropped and Local Attention Object Detector Networks in Low-Altitude Aerial Images

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this