TY - JOUR
T1 - M-RRFS
T2 - A Memory-Based Robust Region Feature Synthesizer for Zero-Shot Object Detection
AU - Huang, Peiliang
AU - Zhang, Dingwen
AU - Cheng, De
AU - Han, Longfei
AU - Zhu, Pengfei
AU - Han, Junwei
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
PY - 2024/10
Y1 - 2024/10
N2 - With the goal to detect both the object categories appearing in the training phase and those never have been observed before testing, zero-shot object detection (ZSD) becomes a challenging yet anticipated task in the community. Current approaches tackle this problem by drawing on the feature synthesis techniques used in the zero-shot image classification (ZSC) task without delving into the inherent problems of ZSD. In this paper, we analyze the out-standing challenges that ZSD presents compared with ZSC—severe intra-class variation, complex category co-occurrence, open test scenario, and reveal their interference to the region feature synthesis process. In view of this, we propose a novel memory-based robust region feature synthesizer (M-RRFS) for ZSD, which is equipped with the Intra-class Semantic Diverging (IntraSD), the Inter-class Structure Preserving (InterSP), and the Cross-Domain Contrast Enhancing (CrossCE) mechanisms to overcome the inadequate intra-class diversity, insufficient inter-class separability, and weak inter-domain contrast problems. Moreover, when designing the whole learning framework, we develop an asynchronous memory container (AMC) to explore the cross-domain relationship between the seen class domain and unseen class domain to reduce the overlap between the distributions of them. Based on AMC, a memory-assisted ZSD inference process is also proposed to further boost the prediction accuracy. To evaluate the proposed approach, comprehensive experiments on MS-COCO, PASCAL VOC, ILSVRC and DIOR datasets are conducted, and superior performances have been achieved. Notably, we achieve new state-of-the-art performances on MS-COCO dataset, i.e., 64.0%, 60.9% and 55.5% Recall@100 with IoU =0.4,0.5,0.6 respectively, and 15.1% mAp with IoU=0.5, under the 48/17 category split setting. Meanwhile, experiments on the DIOR dataset actually build the earliest benchmark for evaluating zero-shot object detection performance on remote sensing images. https://github.com/HPL123/M-RRFS.
AB - With the goal to detect both the object categories appearing in the training phase and those never have been observed before testing, zero-shot object detection (ZSD) becomes a challenging yet anticipated task in the community. Current approaches tackle this problem by drawing on the feature synthesis techniques used in the zero-shot image classification (ZSC) task without delving into the inherent problems of ZSD. In this paper, we analyze the out-standing challenges that ZSD presents compared with ZSC—severe intra-class variation, complex category co-occurrence, open test scenario, and reveal their interference to the region feature synthesis process. In view of this, we propose a novel memory-based robust region feature synthesizer (M-RRFS) for ZSD, which is equipped with the Intra-class Semantic Diverging (IntraSD), the Inter-class Structure Preserving (InterSP), and the Cross-Domain Contrast Enhancing (CrossCE) mechanisms to overcome the inadequate intra-class diversity, insufficient inter-class separability, and weak inter-domain contrast problems. Moreover, when designing the whole learning framework, we develop an asynchronous memory container (AMC) to explore the cross-domain relationship between the seen class domain and unseen class domain to reduce the overlap between the distributions of them. Based on AMC, a memory-assisted ZSD inference process is also proposed to further boost the prediction accuracy. To evaluate the proposed approach, comprehensive experiments on MS-COCO, PASCAL VOC, ILSVRC and DIOR datasets are conducted, and superior performances have been achieved. Notably, we achieve new state-of-the-art performances on MS-COCO dataset, i.e., 64.0%, 60.9% and 55.5% Recall@100 with IoU =0.4,0.5,0.6 respectively, and 15.1% mAp with IoU=0.5, under the 48/17 category split setting. Meanwhile, experiments on the DIOR dataset actually build the earliest benchmark for evaluating zero-shot object detection performance on remote sensing images. https://github.com/HPL123/M-RRFS.
KW - Object detection
KW - Region feature synthesis
KW - Zero-shot learning
KW - Zero-shot object detection
UR - http://www.scopus.com/inward/record.url?scp=85194106124&partnerID=8YFLogxK
U2 - 10.1007/s11263-024-02112-9
DO - 10.1007/s11263-024-02112-9
M3 - 文章
AN - SCOPUS:85194106124
SN - 0920-5691
VL - 132
SP - 4651
EP - 4672
JO - International Journal of Computer Vision
JF - International Journal of Computer Vision
IS - 10
ER -