M-RRFS: A Memory-Based Robust Region Feature Synthesizer for Zero-Shot Object Detection

Peiliang Huang; Dingwen Zhang; De Cheng; Longfei Han; Pengfei Zhu; Junwei Han

doi:10.1007/s11263-024-02112-9

M-RRFS: A Memory-Based Robust Region Feature Synthesizer for Zero-Shot Object Detection

Peiliang Huang, Dingwen Zhang, De Cheng, Longfei Han, Pengfei Zhu, Junwei Han

School of Automation

Research output: Contribution to journal › Article › peer-review

23 Scopus citations

Abstract

With the goal to detect both the object categories appearing in the training phase and those never have been observed before testing, zero-shot object detection (ZSD) becomes a challenging yet anticipated task in the community. Current approaches tackle this problem by drawing on the feature synthesis techniques used in the zero-shot image classification (ZSC) task without delving into the inherent problems of ZSD. In this paper, we analyze the out-standing challenges that ZSD presents compared with ZSC—severe intra-class variation, complex category co-occurrence, open test scenario, and reveal their interference to the region feature synthesis process. In view of this, we propose a novel memory-based robust region feature synthesizer (M-RRFS) for ZSD, which is equipped with the Intra-class Semantic Diverging (IntraSD), the Inter-class Structure Preserving (InterSP), and the Cross-Domain Contrast Enhancing (CrossCE) mechanisms to overcome the inadequate intra-class diversity, insufficient inter-class separability, and weak inter-domain contrast problems. Moreover, when designing the whole learning framework, we develop an asynchronous memory container (AMC) to explore the cross-domain relationship between the seen class domain and unseen class domain to reduce the overlap between the distributions of them. Based on AMC, a memory-assisted ZSD inference process is also proposed to further boost the prediction accuracy. To evaluate the proposed approach, comprehensive experiments on MS-COCO, PASCAL VOC, ILSVRC and DIOR datasets are conducted, and superior performances have been achieved. Notably, we achieve new state-of-the-art performances on MS-COCO dataset, i.e., 64.0%, 60.9% and 55.5% Recall@100 with IoU =0.4,0.5,0.6 respectively, and 15.1% mAp with IoU=0.5, under the 48/17 category split setting. Meanwhile, experiments on the DIOR dataset actually build the earliest benchmark for evaluating zero-shot object detection performance on remote sensing images. https://github.com/HPL123/M-RRFS.

Original language	English
Pages (from-to)	4651-4672
Number of pages	22
Journal	International Journal of Computer Vision
Volume	132
Issue number	10
DOIs	https://doi.org/10.1007/s11263-024-02112-9
State	Published - Oct 2024

Keywords

Object detection
Region feature synthesis
Zero-shot learning
Zero-shot object detection

Access to Document

10.1007/s11263-024-02112-9

Cite this

@article{adcbfb0b7a61437cb37fdec7df0346a0,

title = "M-RRFS: A Memory-Based Robust Region Feature Synthesizer for Zero-Shot Object Detection",

abstract = "With the goal to detect both the object categories appearing in the training phase and those never have been observed before testing, zero-shot object detection (ZSD) becomes a challenging yet anticipated task in the community. Current approaches tackle this problem by drawing on the feature synthesis techniques used in the zero-shot image classification (ZSC) task without delving into the inherent problems of ZSD. In this paper, we analyze the out-standing challenges that ZSD presents compared with ZSC—severe intra-class variation, complex category co-occurrence, open test scenario, and reveal their interference to the region feature synthesis process. In view of this, we propose a novel memory-based robust region feature synthesizer (M-RRFS) for ZSD, which is equipped with the Intra-class Semantic Diverging (IntraSD), the Inter-class Structure Preserving (InterSP), and the Cross-Domain Contrast Enhancing (CrossCE) mechanisms to overcome the inadequate intra-class diversity, insufficient inter-class separability, and weak inter-domain contrast problems. Moreover, when designing the whole learning framework, we develop an asynchronous memory container (AMC) to explore the cross-domain relationship between the seen class domain and unseen class domain to reduce the overlap between the distributions of them. Based on AMC, a memory-assisted ZSD inference process is also proposed to further boost the prediction accuracy. To evaluate the proposed approach, comprehensive experiments on MS-COCO, PASCAL VOC, ILSVRC and DIOR datasets are conducted, and superior performances have been achieved. Notably, we achieve new state-of-the-art performances on MS-COCO dataset, i.e., 64.0%, 60.9% and 55.5% Recall@100 with IoU =0.4,0.5,0.6 respectively, and 15.1% mAp with IoU=0.5, under the 48/17 category split setting. Meanwhile, experiments on the DIOR dataset actually build the earliest benchmark for evaluating zero-shot object detection performance on remote sensing images. https://github.com/HPL123/M-RRFS.",

keywords = "Object detection, Region feature synthesis, Zero-shot learning, Zero-shot object detection",

author = "Peiliang Huang and Dingwen Zhang and De Cheng and Longfei Han and Pengfei Zhu and Junwei Han",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.",

year = "2024",

month = oct,

doi = "10.1007/s11263-024-02112-9",

language = "英语",

volume = "132",

pages = "4651--4672",

journal = "International Journal of Computer Vision",

issn = "0920-5691",

publisher = "Springer Netherlands",

number = "10",

}

TY - JOUR

T1 - M-RRFS

T2 - A Memory-Based Robust Region Feature Synthesizer for Zero-Shot Object Detection

AU - Huang, Peiliang

AU - Zhang, Dingwen

AU - Cheng, De

AU - Han, Longfei

AU - Zhu, Pengfei

AU - Han, Junwei

N1 - Publisher Copyright: © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.

PY - 2024/10

Y1 - 2024/10

N2 - With the goal to detect both the object categories appearing in the training phase and those never have been observed before testing, zero-shot object detection (ZSD) becomes a challenging yet anticipated task in the community. Current approaches tackle this problem by drawing on the feature synthesis techniques used in the zero-shot image classification (ZSC) task without delving into the inherent problems of ZSD. In this paper, we analyze the out-standing challenges that ZSD presents compared with ZSC—severe intra-class variation, complex category co-occurrence, open test scenario, and reveal their interference to the region feature synthesis process. In view of this, we propose a novel memory-based robust region feature synthesizer (M-RRFS) for ZSD, which is equipped with the Intra-class Semantic Diverging (IntraSD), the Inter-class Structure Preserving (InterSP), and the Cross-Domain Contrast Enhancing (CrossCE) mechanisms to overcome the inadequate intra-class diversity, insufficient inter-class separability, and weak inter-domain contrast problems. Moreover, when designing the whole learning framework, we develop an asynchronous memory container (AMC) to explore the cross-domain relationship between the seen class domain and unseen class domain to reduce the overlap between the distributions of them. Based on AMC, a memory-assisted ZSD inference process is also proposed to further boost the prediction accuracy. To evaluate the proposed approach, comprehensive experiments on MS-COCO, PASCAL VOC, ILSVRC and DIOR datasets are conducted, and superior performances have been achieved. Notably, we achieve new state-of-the-art performances on MS-COCO dataset, i.e., 64.0%, 60.9% and 55.5% Recall@100 with IoU =0.4,0.5,0.6 respectively, and 15.1% mAp with IoU=0.5, under the 48/17 category split setting. Meanwhile, experiments on the DIOR dataset actually build the earliest benchmark for evaluating zero-shot object detection performance on remote sensing images. https://github.com/HPL123/M-RRFS.

AB - With the goal to detect both the object categories appearing in the training phase and those never have been observed before testing, zero-shot object detection (ZSD) becomes a challenging yet anticipated task in the community. Current approaches tackle this problem by drawing on the feature synthesis techniques used in the zero-shot image classification (ZSC) task without delving into the inherent problems of ZSD. In this paper, we analyze the out-standing challenges that ZSD presents compared with ZSC—severe intra-class variation, complex category co-occurrence, open test scenario, and reveal their interference to the region feature synthesis process. In view of this, we propose a novel memory-based robust region feature synthesizer (M-RRFS) for ZSD, which is equipped with the Intra-class Semantic Diverging (IntraSD), the Inter-class Structure Preserving (InterSP), and the Cross-Domain Contrast Enhancing (CrossCE) mechanisms to overcome the inadequate intra-class diversity, insufficient inter-class separability, and weak inter-domain contrast problems. Moreover, when designing the whole learning framework, we develop an asynchronous memory container (AMC) to explore the cross-domain relationship between the seen class domain and unseen class domain to reduce the overlap between the distributions of them. Based on AMC, a memory-assisted ZSD inference process is also proposed to further boost the prediction accuracy. To evaluate the proposed approach, comprehensive experiments on MS-COCO, PASCAL VOC, ILSVRC and DIOR datasets are conducted, and superior performances have been achieved. Notably, we achieve new state-of-the-art performances on MS-COCO dataset, i.e., 64.0%, 60.9% and 55.5% Recall@100 with IoU =0.4,0.5,0.6 respectively, and 15.1% mAp with IoU=0.5, under the 48/17 category split setting. Meanwhile, experiments on the DIOR dataset actually build the earliest benchmark for evaluating zero-shot object detection performance on remote sensing images. https://github.com/HPL123/M-RRFS.

KW - Object detection

KW - Region feature synthesis

KW - Zero-shot learning

KW - Zero-shot object detection

UR - http://www.scopus.com/inward/record.url?scp=85194106124&partnerID=8YFLogxK

U2 - 10.1007/s11263-024-02112-9

DO - 10.1007/s11263-024-02112-9

M3 - 文章

AN - SCOPUS:85194106124

SN - 0920-5691

VL - 132

SP - 4651

EP - 4672

JO - International Journal of Computer Vision

JF - International Journal of Computer Vision

IS - 10

ER -

M-RRFS: A Memory-Based Robust Region Feature Synthesizer for Zero-Shot Object Detection

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this