Learning an Invariant and Equivariant Network for Weakly Supervised Object Detection

Xiaoxu Feng, Xiwen Yao, Hui Shen, Gong Cheng, Bin Xiao, Junwei Han

Research output: Contribution to journalArticlepeer-review

15 Scopus citations

Abstract

Weakly Supervised Object Detection (WSOD) is of increasing importance in the community of computer vision as its extensive applications and low manual cost. Most of the advanced WSOD approaches build upon an indefinite and quality-agnostic framework, leading to unstable and incomplete object detectors. This paper attributes these issues to the process of inconsistent learning for object variations and the unawareness of localization quality and constructs a novel end-to-end Invariant and Equivariant Network (IENet). It is implemented with a flexible multi-branch online refinement, to be naturally more comprehensive-perceptive against various objects. Specifically, IENet first performs label propagation from the predicted instances to their transformed ones in a progressive manner, achieving affine-invariant learning. Meanwhile, IENet also naturally utilizes rotation-equivariant learning as a pretext task and derives an instance-level rotation-equivariant branch to be aware of the localization quality. With affine-invariance learning and rotation-equivariant learning, IENet urges consistent and holistic feature learning for WSOD without additional annotations. On the challenging datasets of both natural scenes and aerial scenes, we substantially boost WSOD to new state-of-the-art performance. The codes have been released at: https://github.com/XiaoxFeng/IENet.

Original languageEnglish
Pages (from-to)11977-11992
Number of pages16
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
Volume45
Issue number10
DOIs
StatePublished - 1 Oct 2023

Keywords

  • equivariant learning
  • invariant learning
  • Object detection
  • weakly supervised learning

Fingerprint

Dive into the research topics of 'Learning an Invariant and Equivariant Network for Weakly Supervised Object Detection'. Together they form a unique fingerprint.

Cite this