TY - JOUR
T1 - Learning an Invariant and Equivariant Network for Weakly Supervised Object Detection
AU - Feng, Xiaoxu
AU - Yao, Xiwen
AU - Shen, Hui
AU - Cheng, Gong
AU - Xiao, Bin
AU - Han, Junwei
N1 - Publisher Copyright:
© 1979-2012 IEEE.
PY - 2023/10/1
Y1 - 2023/10/1
N2 - Weakly Supervised Object Detection (WSOD) is of increasing importance in the community of computer vision as its extensive applications and low manual cost. Most of the advanced WSOD approaches build upon an indefinite and quality-agnostic framework, leading to unstable and incomplete object detectors. This paper attributes these issues to the process of inconsistent learning for object variations and the unawareness of localization quality and constructs a novel end-to-end Invariant and Equivariant Network (IENet). It is implemented with a flexible multi-branch online refinement, to be naturally more comprehensive-perceptive against various objects. Specifically, IENet first performs label propagation from the predicted instances to their transformed ones in a progressive manner, achieving affine-invariant learning. Meanwhile, IENet also naturally utilizes rotation-equivariant learning as a pretext task and derives an instance-level rotation-equivariant branch to be aware of the localization quality. With affine-invariance learning and rotation-equivariant learning, IENet urges consistent and holistic feature learning for WSOD without additional annotations. On the challenging datasets of both natural scenes and aerial scenes, we substantially boost WSOD to new state-of-the-art performance. The codes have been released at: https://github.com/XiaoxFeng/IENet.
AB - Weakly Supervised Object Detection (WSOD) is of increasing importance in the community of computer vision as its extensive applications and low manual cost. Most of the advanced WSOD approaches build upon an indefinite and quality-agnostic framework, leading to unstable and incomplete object detectors. This paper attributes these issues to the process of inconsistent learning for object variations and the unawareness of localization quality and constructs a novel end-to-end Invariant and Equivariant Network (IENet). It is implemented with a flexible multi-branch online refinement, to be naturally more comprehensive-perceptive against various objects. Specifically, IENet first performs label propagation from the predicted instances to their transformed ones in a progressive manner, achieving affine-invariant learning. Meanwhile, IENet also naturally utilizes rotation-equivariant learning as a pretext task and derives an instance-level rotation-equivariant branch to be aware of the localization quality. With affine-invariance learning and rotation-equivariant learning, IENet urges consistent and holistic feature learning for WSOD without additional annotations. On the challenging datasets of both natural scenes and aerial scenes, we substantially boost WSOD to new state-of-the-art performance. The codes have been released at: https://github.com/XiaoxFeng/IENet.
KW - equivariant learning
KW - invariant learning
KW - Object detection
KW - weakly supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85159836001&partnerID=8YFLogxK
U2 - 10.1109/TPAMI.2023.3275142
DO - 10.1109/TPAMI.2023.3275142
M3 - 文章
C2 - 37167047
AN - SCOPUS:85159836001
SN - 0162-8828
VL - 45
SP - 11977
EP - 11992
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
IS - 10
ER -