TY - JOUR
T1 - Cross-Scale Feature Fusion for Object Detection in Optical Remote Sensing Images
AU - Cheng, Gong
AU - Si, Yongjie
AU - Hong, Hailong
AU - Yao, Xiwen
AU - Guo, Lei
N1 - Publisher Copyright:
© 2004-2012 IEEE.
PY - 2021/3
Y1 - 2021/3
N2 - For the time being, there are many groundbreaking object detection frameworks used in natural scene images. These algorithms have good detection performance on the data sets of open natural scenes. However, applying these frameworks to remote sensing images directly is not very effective. The existing deep-learning-based object detection algorithms still face some challenges when dealing with remote sensing images because these images usually contain a number of targets with large variations of object sizes as well as interclass similarity. Aiming at the challenges of object detection in optical remote sensing images, we propose an end-to-end cross-scale feature fusion (CSFF) framework, which can effectively improve the object detection accuracy. Specifically, we first use a feature pyramid network (FPN) to obtain multilevel feature maps and then insert a squeeze and excitation (SE) block into the top layer to model the relationship between different feature channels. Next, we use the CSFF module to obtain powerful and discriminative multilevel feature representations. Finally, we implement our work in the framework of Faster region-based CNN (R-CNN). In the experiment, we evaluate our method on a publicly available large-scale data set, named DIOR, and obtain an improvement of 3.0% measured in terms of mAP compared with Faster R-CNN with FPN.
AB - For the time being, there are many groundbreaking object detection frameworks used in natural scene images. These algorithms have good detection performance on the data sets of open natural scenes. However, applying these frameworks to remote sensing images directly is not very effective. The existing deep-learning-based object detection algorithms still face some challenges when dealing with remote sensing images because these images usually contain a number of targets with large variations of object sizes as well as interclass similarity. Aiming at the challenges of object detection in optical remote sensing images, we propose an end-to-end cross-scale feature fusion (CSFF) framework, which can effectively improve the object detection accuracy. Specifically, we first use a feature pyramid network (FPN) to obtain multilevel feature maps and then insert a squeeze and excitation (SE) block into the top layer to model the relationship between different feature channels. Next, we use the CSFF module to obtain powerful and discriminative multilevel feature representations. Finally, we implement our work in the framework of Faster region-based CNN (R-CNN). In the experiment, we evaluate our method on a publicly available large-scale data set, named DIOR, and obtain an improvement of 3.0% measured in terms of mAP compared with Faster R-CNN with FPN.
KW - Convolutional neural networks (CNNs)
KW - cross-scale feature fusion (CSFF)
KW - object detection
KW - remote sensing images
UR - http://www.scopus.com/inward/record.url?scp=85101858852&partnerID=8YFLogxK
U2 - 10.1109/LGRS.2020.2975541
DO - 10.1109/LGRS.2020.2975541
M3 - 文章
AN - SCOPUS:85101858852
SN - 1545-598X
VL - 18
SP - 431
EP - 435
JO - IEEE Geoscience and Remote Sensing Letters
JF - IEEE Geoscience and Remote Sensing Letters
IS - 3
M1 - 9024005
ER -