动态特征融合的遥感图像目标检测

Xing Xing Xie; Gong Cheng; Yan Qing Yao; Xi Wen Yao; Jun Wei Han

doi:10.11897/SP.J.1016.2022.00735

动态特征融合的遥感图像目标检测

Translated title of the contribution: Dynamic Feature Fusion for Object Detection in Remote Sensing Images

Xing Xing Xie, Gong Cheng, Yan Qing Yao, Xi Wen Yao, Jun Wei Han

School of Automation

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

13 Scopus citations

Abstract

Remote sensing images contain more valuable information, which has opened a door to help us observe and measure the earth's surface. Thanks to the advance of earth observation techniques, remote sensing images with different spectral and spatial resolutions are increasing daily. How to understand these huge volumes of remote sensing images is becoming more and more important. As a fundamental task of remote sensing image understanding, object detection in remote sensing images has been an active research area. The goal of object detection in remote sensing images is to locate ground objects and classify them into different categories. It supports a wide range of real-world applications, including aerial reconnaissance, emergency rescue, and urban management. In recent years, deep learning techniques and large-scale datasets with annotations have provided a major improvement in general object detection, e.g., Fast/Faster R-CNN, RetinaNet, and FCOS. Driven by these improvements, object detection in remote sensing images has achieved significant progress. However, the large variations of object sizes and inter-class similarity are still two big challenges for object detection in remote sensing images. To address these challenges, many works have been introduced. One of the typical methods, termed Feature Pyramid Network (FPN), creates a feature pyramid with strong semantics at all scales by combining low-resolution, semantically strong features with high-resolution, semantically weak features. After that, Libra R-CNN fuses the features of different scales with the same weights for enchaining the discriminability of features. PANet enhances the entire feature hierarchy by top-down and bottom-up path augmentation, which shortens the information path between lower features and top ones. These methods greatly improve detection accuracy. However, most of them utilize fixed weights to fuse the features of different scales, in which all input images share the fusion method, ignoring the influence of object scales of input images on feature fusion. On the one hand, the feature fusion approach is static, which is unable to change fusion weights according to the size of objects adaptively, thus preventing the robustness of detection. On the other hand, it can introduce useless features and suppress the feature representation when fusing features. To this end, we design a dynamic feature fusion network for minimizing the influence from the variations and improving the representation of features. The network contains a feature gate module and a dynamic fusion module. The feature gate module aims to selectively attenuate useless features and enhance useful features before dynamic feature fusion, and minimize the interference of background information on subsequent dynamic fusion. We model it by a gate unit, which consists of spatial, channel, and global attention. The dynamic fusion module is to establish the connection between the object scales and the feature fusion weights, thus learning the fusion weights dynamically according to object scales. We achieve this by a lightweight fully-connected network, which takes the multi-scale features as the input. Finally, we propose a dynamic feature fusion network on the Faster R-CNN with FPN, and conduct extensive experiments on two large-scale remote sensing image object detection datasets, named DIOR and DOTA. The experimental results demonstrate the effectiveness of our proposed method.

Translated title of the contribution	Dynamic Feature Fusion for Object Detection in Remote Sensing Images
Original language	Chinese (Traditional)
Pages (from-to)	735-747
Number of pages	13
Journal	Jisuanji Xuebao/Chinese Journal of Computers
Volume	45
Issue number	4
DOIs	https://doi.org/10.11897/SP.J.1016.2022.00735
State	Published - Apr 2022

Access to Document

10.11897/SP.J.1016.2022.00735

Cite this

@article{b2a0fc630e174c04916070cc69ba8c1f,

title = "动态特征融合的遥感图像目标检测",

abstract = "Remote sensing images contain more valuable information, which has opened a door to help us observe and measure the earth's surface. Thanks to the advance of earth observation techniques, remote sensing images with different spectral and spatial resolutions are increasing daily. How to understand these huge volumes of remote sensing images is becoming more and more important. As a fundamental task of remote sensing image understanding, object detection in remote sensing images has been an active research area. The goal of object detection in remote sensing images is to locate ground objects and classify them into different categories. It supports a wide range of real-world applications, including aerial reconnaissance, emergency rescue, and urban management. In recent years, deep learning techniques and large-scale datasets with annotations have provided a major improvement in general object detection, e.g., Fast/Faster R-CNN, RetinaNet, and FCOS. Driven by these improvements, object detection in remote sensing images has achieved significant progress. However, the large variations of object sizes and inter-class similarity are still two big challenges for object detection in remote sensing images. To address these challenges, many works have been introduced. One of the typical methods, termed Feature Pyramid Network (FPN), creates a feature pyramid with strong semantics at all scales by combining low-resolution, semantically strong features with high-resolution, semantically weak features. After that, Libra R-CNN fuses the features of different scales with the same weights for enchaining the discriminability of features. PANet enhances the entire feature hierarchy by top-down and bottom-up path augmentation, which shortens the information path between lower features and top ones. These methods greatly improve detection accuracy. However, most of them utilize fixed weights to fuse the features of different scales, in which all input images share the fusion method, ignoring the influence of object scales of input images on feature fusion. On the one hand, the feature fusion approach is static, which is unable to change fusion weights according to the size of objects adaptively, thus preventing the robustness of detection. On the other hand, it can introduce useless features and suppress the feature representation when fusing features. To this end, we design a dynamic feature fusion network for minimizing the influence from the variations and improving the representation of features. The network contains a feature gate module and a dynamic fusion module. The feature gate module aims to selectively attenuate useless features and enhance useful features before dynamic feature fusion, and minimize the interference of background information on subsequent dynamic fusion. We model it by a gate unit, which consists of spatial, channel, and global attention. The dynamic fusion module is to establish the connection between the object scales and the feature fusion weights, thus learning the fusion weights dynamically according to object scales. We achieve this by a lightweight fully-connected network, which takes the multi-scale features as the input. Finally, we propose a dynamic feature fusion network on the Faster R-CNN with FPN, and conduct extensive experiments on two large-scale remote sensing image object detection datasets, named DIOR and DOTA. The experimental results demonstrate the effectiveness of our proposed method.",

keywords = "Dynamic feature fusion, Feature gate, Object detection, Remote sensing images",

author = "Xie, {Xing Xing} and Gong Cheng and Yao, {Yan Qing} and Yao, {Xi Wen} and Han, {Jun Wei}",

year = "2022",

month = apr,

doi = "10.11897/SP.J.1016.2022.00735",

language = "繁体中文",

volume = "45",

pages = "735--747",

journal = "Jisuanji Xuebao/Chinese Journal of Computers",

issn = "0254-4164",

publisher = "Science Press ",

number = "4",

}

TY - JOUR

T1 - 动态特征融合的遥感图像目标检测

AU - Xie, Xing Xing

AU - Cheng, Gong

AU - Yao, Yan Qing

AU - Yao, Xi Wen

AU - Han, Jun Wei

PY - 2022/4

Y1 - 2022/4

N2 - Remote sensing images contain more valuable information, which has opened a door to help us observe and measure the earth's surface. Thanks to the advance of earth observation techniques, remote sensing images with different spectral and spatial resolutions are increasing daily. How to understand these huge volumes of remote sensing images is becoming more and more important. As a fundamental task of remote sensing image understanding, object detection in remote sensing images has been an active research area. The goal of object detection in remote sensing images is to locate ground objects and classify them into different categories. It supports a wide range of real-world applications, including aerial reconnaissance, emergency rescue, and urban management. In recent years, deep learning techniques and large-scale datasets with annotations have provided a major improvement in general object detection, e.g., Fast/Faster R-CNN, RetinaNet, and FCOS. Driven by these improvements, object detection in remote sensing images has achieved significant progress. However, the large variations of object sizes and inter-class similarity are still two big challenges for object detection in remote sensing images. To address these challenges, many works have been introduced. One of the typical methods, termed Feature Pyramid Network (FPN), creates a feature pyramid with strong semantics at all scales by combining low-resolution, semantically strong features with high-resolution, semantically weak features. After that, Libra R-CNN fuses the features of different scales with the same weights for enchaining the discriminability of features. PANet enhances the entire feature hierarchy by top-down and bottom-up path augmentation, which shortens the information path between lower features and top ones. These methods greatly improve detection accuracy. However, most of them utilize fixed weights to fuse the features of different scales, in which all input images share the fusion method, ignoring the influence of object scales of input images on feature fusion. On the one hand, the feature fusion approach is static, which is unable to change fusion weights according to the size of objects adaptively, thus preventing the robustness of detection. On the other hand, it can introduce useless features and suppress the feature representation when fusing features. To this end, we design a dynamic feature fusion network for minimizing the influence from the variations and improving the representation of features. The network contains a feature gate module and a dynamic fusion module. The feature gate module aims to selectively attenuate useless features and enhance useful features before dynamic feature fusion, and minimize the interference of background information on subsequent dynamic fusion. We model it by a gate unit, which consists of spatial, channel, and global attention. The dynamic fusion module is to establish the connection between the object scales and the feature fusion weights, thus learning the fusion weights dynamically according to object scales. We achieve this by a lightweight fully-connected network, which takes the multi-scale features as the input. Finally, we propose a dynamic feature fusion network on the Faster R-CNN with FPN, and conduct extensive experiments on two large-scale remote sensing image object detection datasets, named DIOR and DOTA. The experimental results demonstrate the effectiveness of our proposed method.

AB - Remote sensing images contain more valuable information, which has opened a door to help us observe and measure the earth's surface. Thanks to the advance of earth observation techniques, remote sensing images with different spectral and spatial resolutions are increasing daily. How to understand these huge volumes of remote sensing images is becoming more and more important. As a fundamental task of remote sensing image understanding, object detection in remote sensing images has been an active research area. The goal of object detection in remote sensing images is to locate ground objects and classify them into different categories. It supports a wide range of real-world applications, including aerial reconnaissance, emergency rescue, and urban management. In recent years, deep learning techniques and large-scale datasets with annotations have provided a major improvement in general object detection, e.g., Fast/Faster R-CNN, RetinaNet, and FCOS. Driven by these improvements, object detection in remote sensing images has achieved significant progress. However, the large variations of object sizes and inter-class similarity are still two big challenges for object detection in remote sensing images. To address these challenges, many works have been introduced. One of the typical methods, termed Feature Pyramid Network (FPN), creates a feature pyramid with strong semantics at all scales by combining low-resolution, semantically strong features with high-resolution, semantically weak features. After that, Libra R-CNN fuses the features of different scales with the same weights for enchaining the discriminability of features. PANet enhances the entire feature hierarchy by top-down and bottom-up path augmentation, which shortens the information path between lower features and top ones. These methods greatly improve detection accuracy. However, most of them utilize fixed weights to fuse the features of different scales, in which all input images share the fusion method, ignoring the influence of object scales of input images on feature fusion. On the one hand, the feature fusion approach is static, which is unable to change fusion weights according to the size of objects adaptively, thus preventing the robustness of detection. On the other hand, it can introduce useless features and suppress the feature representation when fusing features. To this end, we design a dynamic feature fusion network for minimizing the influence from the variations and improving the representation of features. The network contains a feature gate module and a dynamic fusion module. The feature gate module aims to selectively attenuate useless features and enhance useful features before dynamic feature fusion, and minimize the interference of background information on subsequent dynamic fusion. We model it by a gate unit, which consists of spatial, channel, and global attention. The dynamic fusion module is to establish the connection between the object scales and the feature fusion weights, thus learning the fusion weights dynamically according to object scales. We achieve this by a lightweight fully-connected network, which takes the multi-scale features as the input. Finally, we propose a dynamic feature fusion network on the Faster R-CNN with FPN, and conduct extensive experiments on two large-scale remote sensing image object detection datasets, named DIOR and DOTA. The experimental results demonstrate the effectiveness of our proposed method.

KW - Dynamic feature fusion

KW - Feature gate

KW - Object detection

KW - Remote sensing images

UR - http://www.scopus.com/inward/record.url?scp=85128868547&partnerID=8YFLogxK

U2 - 10.11897/SP.J.1016.2022.00735

DO - 10.11897/SP.J.1016.2022.00735

M3 - 文章

AN - SCOPUS:85128868547

SN - 0254-4164

VL - 45

SP - 735

EP - 747

JO - Jisuanji Xuebao/Chinese Journal of Computers

JF - Jisuanji Xuebao/Chinese Journal of Computers

IS - 4

ER -

动态特征融合的遥感图像目标检测

Abstract

Access to Document

Other files and links

Fingerprint

Cite this