动态特征融合的遥感图像目标检测

Translated title of the contribution: Dynamic Feature Fusion for Object Detection in Remote Sensing Images

Xing Xing Xie, Gong Cheng, Yan Qing Yao, Xi Wen Yao, Jun Wei Han

Research output: Contribution to journalArticlepeer-review

13 Scopus citations

Abstract

Remote sensing images contain more valuable information, which has opened a door to help us observe and measure the earth's surface. Thanks to the advance of earth observation techniques, remote sensing images with different spectral and spatial resolutions are increasing daily. How to understand these huge volumes of remote sensing images is becoming more and more important. As a fundamental task of remote sensing image understanding, object detection in remote sensing images has been an active research area. The goal of object detection in remote sensing images is to locate ground objects and classify them into different categories. It supports a wide range of real-world applications, including aerial reconnaissance, emergency rescue, and urban management. In recent years, deep learning techniques and large-scale datasets with annotations have provided a major improvement in general object detection, e.g., Fast/Faster R-CNN, RetinaNet, and FCOS. Driven by these improvements, object detection in remote sensing images has achieved significant progress. However, the large variations of object sizes and inter-class similarity are still two big challenges for object detection in remote sensing images. To address these challenges, many works have been introduced. One of the typical methods, termed Feature Pyramid Network (FPN), creates a feature pyramid with strong semantics at all scales by combining low-resolution, semantically strong features with high-resolution, semantically weak features. After that, Libra R-CNN fuses the features of different scales with the same weights for enchaining the discriminability of features. PANet enhances the entire feature hierarchy by top-down and bottom-up path augmentation, which shortens the information path between lower features and top ones. These methods greatly improve detection accuracy. However, most of them utilize fixed weights to fuse the features of different scales, in which all input images share the fusion method, ignoring the influence of object scales of input images on feature fusion. On the one hand, the feature fusion approach is static, which is unable to change fusion weights according to the size of objects adaptively, thus preventing the robustness of detection. On the other hand, it can introduce useless features and suppress the feature representation when fusing features. To this end, we design a dynamic feature fusion network for minimizing the influence from the variations and improving the representation of features. The network contains a feature gate module and a dynamic fusion module. The feature gate module aims to selectively attenuate useless features and enhance useful features before dynamic feature fusion, and minimize the interference of background information on subsequent dynamic fusion. We model it by a gate unit, which consists of spatial, channel, and global attention. The dynamic fusion module is to establish the connection between the object scales and the feature fusion weights, thus learning the fusion weights dynamically according to object scales. We achieve this by a lightweight fully-connected network, which takes the multi-scale features as the input. Finally, we propose a dynamic feature fusion network on the Faster R-CNN with FPN, and conduct extensive experiments on two large-scale remote sensing image object detection datasets, named DIOR and DOTA. The experimental results demonstrate the effectiveness of our proposed method.

Translated title of the contributionDynamic Feature Fusion for Object Detection in Remote Sensing Images
Original languageChinese (Traditional)
Pages (from-to)735-747
Number of pages13
JournalJisuanji Xuebao/Chinese Journal of Computers
Volume45
Issue number4
DOIs
StatePublished - Apr 2022

Fingerprint

Dive into the research topics of 'Dynamic Feature Fusion for Object Detection in Remote Sensing Images'. Together they form a unique fingerprint.

Cite this