Weakly Supervised Object Detection Using Proposal- and Semantic-Level Relationships

Dingwen Zhang; Wenyuan Zeng; Jieru Yao; Junwei Han

doi:10.1109/TPAMI.2020.3046647

Weakly Supervised Object Detection Using Proposal- and Semantic-Level Relationships

Dingwen Zhang, Wenyuan Zeng, Jieru Yao, Junwei Han

School of Automation

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

95 Scopus citations

Abstract

In recent years, weakly supervised object detection has attracted great attention in the computer vision community. Although numerous deep learning-based approaches have been proposed in the past few years, such an ill-posed problem is still challenging and the learning performance is still behind the expectation. In fact, most of the existing approaches only consider the visual appearance of each proposal region but ignore to make use of the helpful context information. To this end, this paper introduces two levels of context into the weakly supervised learning framework. The first one is the proposal-level context, i.e., the relationship of the spatially adjacent proposals. The second one is the semantic-level context, i.e., the relationship of the co-occurring object categories. Therefore, the proposed weakly supervised learning framework contains not only the cognition process on the visual appearance but also the reasoning process on the proposal- and semantic-level relationships, which leads to the novel deep multiple instance reasoning framework. Specifically, built upon a conventional CNN-based network architecture, the proposed framework is equipped with two additional graph convolutional network-based reasoning models to implement object location reasoning and multi-label reasoning within an end-to-end network training procedure. Comprehensive experiments on the widely used PASCAL VOC and MS COCO benchmarks have been implemented, which demonstrate the superior capacity of the proposed approach when compared with other state-of-the-art methods and baseline models.

Original language	English
Pages (from-to)	3349-3363
Number of pages	15
Journal	IEEE Transactions on Pattern Analysis and Machine Intelligence
Volume	44
Issue number	6
DOIs	https://doi.org/10.1109/TPAMI.2020.3046647
State	Published - 1 Jun 2022

Keywords

Weakly supervised object detection
graphical convolutional network
multiple-instance learning

Access to Document

10.1109/TPAMI.2020.3046647

Cite this

@article{2f0be1f903c24899adb239e88825f6da,

title = "Weakly Supervised Object Detection Using Proposal- and Semantic-Level Relationships",

abstract = "In recent years, weakly supervised object detection has attracted great attention in the computer vision community. Although numerous deep learning-based approaches have been proposed in the past few years, such an ill-posed problem is still challenging and the learning performance is still behind the expectation. In fact, most of the existing approaches only consider the visual appearance of each proposal region but ignore to make use of the helpful context information. To this end, this paper introduces two levels of context into the weakly supervised learning framework. The first one is the proposal-level context, i.e., the relationship of the spatially adjacent proposals. The second one is the semantic-level context, i.e., the relationship of the co-occurring object categories. Therefore, the proposed weakly supervised learning framework contains not only the cognition process on the visual appearance but also the reasoning process on the proposal- and semantic-level relationships, which leads to the novel deep multiple instance reasoning framework. Specifically, built upon a conventional CNN-based network architecture, the proposed framework is equipped with two additional graph convolutional network-based reasoning models to implement object location reasoning and multi-label reasoning within an end-to-end network training procedure. Comprehensive experiments on the widely used PASCAL VOC and MS COCO benchmarks have been implemented, which demonstrate the superior capacity of the proposed approach when compared with other state-of-the-art methods and baseline models.",

keywords = "Weakly supervised object detection, graphical convolutional network, multiple-instance learning",

author = "Dingwen Zhang and Wenyuan Zeng and Jieru Yao and Junwei Han",

note = "Publisher Copyright: {\textcopyright} 1979-2012 IEEE.",

year = "2022",

month = jun,

day = "1",

doi = "10.1109/TPAMI.2020.3046647",

language = "英语",

volume = "44",

pages = "3349--3363",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

publisher = "IEEE Computer Society",

number = "6",

}

TY - JOUR

T1 - Weakly Supervised Object Detection Using Proposal- and Semantic-Level Relationships

AU - Zhang, Dingwen

AU - Zeng, Wenyuan

AU - Yao, Jieru

AU - Han, Junwei

PY - 2022/6/1

Y1 - 2022/6/1

N2 - In recent years, weakly supervised object detection has attracted great attention in the computer vision community. Although numerous deep learning-based approaches have been proposed in the past few years, such an ill-posed problem is still challenging and the learning performance is still behind the expectation. In fact, most of the existing approaches only consider the visual appearance of each proposal region but ignore to make use of the helpful context information. To this end, this paper introduces two levels of context into the weakly supervised learning framework. The first one is the proposal-level context, i.e., the relationship of the spatially adjacent proposals. The second one is the semantic-level context, i.e., the relationship of the co-occurring object categories. Therefore, the proposed weakly supervised learning framework contains not only the cognition process on the visual appearance but also the reasoning process on the proposal- and semantic-level relationships, which leads to the novel deep multiple instance reasoning framework. Specifically, built upon a conventional CNN-based network architecture, the proposed framework is equipped with two additional graph convolutional network-based reasoning models to implement object location reasoning and multi-label reasoning within an end-to-end network training procedure. Comprehensive experiments on the widely used PASCAL VOC and MS COCO benchmarks have been implemented, which demonstrate the superior capacity of the proposed approach when compared with other state-of-the-art methods and baseline models.

AB - In recent years, weakly supervised object detection has attracted great attention in the computer vision community. Although numerous deep learning-based approaches have been proposed in the past few years, such an ill-posed problem is still challenging and the learning performance is still behind the expectation. In fact, most of the existing approaches only consider the visual appearance of each proposal region but ignore to make use of the helpful context information. To this end, this paper introduces two levels of context into the weakly supervised learning framework. The first one is the proposal-level context, i.e., the relationship of the spatially adjacent proposals. The second one is the semantic-level context, i.e., the relationship of the co-occurring object categories. Therefore, the proposed weakly supervised learning framework contains not only the cognition process on the visual appearance but also the reasoning process on the proposal- and semantic-level relationships, which leads to the novel deep multiple instance reasoning framework. Specifically, built upon a conventional CNN-based network architecture, the proposed framework is equipped with two additional graph convolutional network-based reasoning models to implement object location reasoning and multi-label reasoning within an end-to-end network training procedure. Comprehensive experiments on the widely used PASCAL VOC and MS COCO benchmarks have been implemented, which demonstrate the superior capacity of the proposed approach when compared with other state-of-the-art methods and baseline models.

KW - Weakly supervised object detection

KW - graphical convolutional network

KW - multiple-instance learning

UR - http://www.scopus.com/inward/record.url?scp=85098800087&partnerID=8YFLogxK

U2 - 10.1109/TPAMI.2020.3046647

DO - 10.1109/TPAMI.2020.3046647

M3 - 文章

C2 - 33351751

AN - SCOPUS:85098800087

SN - 0162-8828

VL - 44

SP - 3349

EP - 3363

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

IS - 6

ER -

Weakly Supervised Object Detection Using Proposal- and Semantic-Level Relationships

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this