AODet: Aerial Object Detection Using Transformers for Foreground Regions

Xiaoming Wang, Hao Chen, Xiangxiang Chu, Peng Wang

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Aerial object detection is an important task and has received significant attention in recent years. Aerial images typically depict small and sparse instances against a simple background. Nevertheless, the simple background can only provide limited information. Based on the observation, we present a new transformer-based framework for aerial object detection. In contrast to previous methods that address sparsity through multistage pipelines involving region-of-interest (RoI) techniques or sparse convolutions, our method, referred as AODet, enjoy two significant advantages: 1) AODet is a simple yet accurate object detector which is specialized for aerial object detection. AODet identifies the background regions earlier and then only operates on the regions which most likely include the foreground objects, thereby significantly reducing the redundant computations. The utilization of transformer exploits more context information between foreground regions, helping to retain high-quality detection results and 2) instead of involving the sparse operations like sparse convolutions or clustering algorithms/RoI operations, AODet employs transformer to detect objects from foreground proposals. Our approach is simpler and can be easily implemented with simple tensor manipulations. Extensive experiments have conducted on VisDrone and DOTA. AODet achieves 40.9 AP on Visdrone and 79.6 mAP DOTA, demonstrating the effectiveness of AODet.

Original languageEnglish
Article number4106711
JournalIEEE Transactions on Geoscience and Remote Sensing
Volume62
DOIs
StatePublished - 2024
Externally publishedYes

Keywords

  • Aerial object detection
  • transformer

Fingerprint

Dive into the research topics of 'AODet: Aerial Object Detection Using Transformers for Foreground Regions'. Together they form a unique fingerprint.

Cite this