TY - JOUR
T1 - Learning Discriminative Representation for Fine-Grained Object Detection in Remote Sensing Images
AU - Xie, Xingxing
AU - Cheng, Gong
AU - Li, Wenbo
AU - Lang, Chunbo
AU - Zhang, Peng
AU - Yao, Yanqing
AU - Han, Junwei
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Fine-grained object detection (FGOD) in remote sensing images is an emerging and challenging task in the field of image intelligent interpretation. It aims to localize objects while classifying them into different fine-grained categories. Modern FGOD methods are mainly derived from well-developed detectors and have made compelling progress. Despite this, these methods struggle to perform well in classifying objects at the subordinate level due to the limitations of their representation manners. In this paper, we propose a network capable of learning discriminative representation (DR) for fine-grained object detection in remote sensing images, named DRNet. First, a fine-grained branch that works in parallel with other task branches is introduced, where objects' features are re-encoded with dual refinement to generate discriminative representation, enabling accurate fine-grained classification. Second, we design a confusion-minimized loss that automatically scales loss contributions according to the separability of samples to train the fine-grained branch, further boosting discriminative ability of the representation and better addressing hard-to-distinguish objects. Moreover, we devise an interaction verification strategy that empowers the network to fully utilize the results of fine-grained classification and coarse classification for achieving robust inference. On large-scale FAIR1M-1.0 and FAIR1M-2.0 datasets, our DRNet with ResNet50 and 1× training schedule obtains 40.87% mAP and 47.04% mAP, respectively, establishing new state-of-the-arts for fine-grained object detection in remote sensing images. The source code is available at https://github.com//54wb//DRNet.
AB - Fine-grained object detection (FGOD) in remote sensing images is an emerging and challenging task in the field of image intelligent interpretation. It aims to localize objects while classifying them into different fine-grained categories. Modern FGOD methods are mainly derived from well-developed detectors and have made compelling progress. Despite this, these methods struggle to perform well in classifying objects at the subordinate level due to the limitations of their representation manners. In this paper, we propose a network capable of learning discriminative representation (DR) for fine-grained object detection in remote sensing images, named DRNet. First, a fine-grained branch that works in parallel with other task branches is introduced, where objects' features are re-encoded with dual refinement to generate discriminative representation, enabling accurate fine-grained classification. Second, we design a confusion-minimized loss that automatically scales loss contributions according to the separability of samples to train the fine-grained branch, further boosting discriminative ability of the representation and better addressing hard-to-distinguish objects. Moreover, we devise an interaction verification strategy that empowers the network to fully utilize the results of fine-grained classification and coarse classification for achieving robust inference. On large-scale FAIR1M-1.0 and FAIR1M-2.0 datasets, our DRNet with ResNet50 and 1× training schedule obtains 40.87% mAP and 47.04% mAP, respectively, establishing new state-of-the-arts for fine-grained object detection in remote sensing images. The source code is available at https://github.com//54wb//DRNet.
KW - confusion-minimized loss
KW - discriminative representation learning
KW - fine-grained branch
KW - Fine-grained object detection
UR - http://www.scopus.com/inward/record.url?scp=85219131810&partnerID=8YFLogxK
U2 - 10.1109/TCSVT.2025.3544741
DO - 10.1109/TCSVT.2025.3544741
M3 - 文章
AN - SCOPUS:85219131810
SN - 1051-8215
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
ER -