Learning Object Detectors With Semi-Annotated Weak Labels

Dingwen Zhang; Junwei Han; Guangyu Guo; Long Zhao

doi:10.1109/TCSVT.2018.2884173

Learning Object Detectors With Semi-Annotated Weak Labels

Dingwen Zhang, Junwei Han, Guangyu Guo, Long Zhao

School of Automation

Research output: Contribution to journal › Article › peer-review

25 Scopus citations

Abstract

For alleviating the human labor associated with annotating the training data for learning object detectors, recent research has focused on semi-supervised object detection (SSOD) and weakly supervised object detection (WSOD) approaches. In SSOD, instead of annotating all the instances in the whole training set, people only need to annotate the part of the training instances using bounding boxes. In WSOD, people need to annotate the image-level tags on all training images to indicate the object categories contained by the corresponding images since more detailed bounding box annotations are no longer needed. Along this line of research, this paper makes a further step to alleviate the human labor in annotating training data, leading to the problem of object detection with semi-annotated weak labels (ODSAWLs). Instead of labeling image-level tags on all training images, ODSAWL only needs the image-level tags for a small portion of the training images, and then, the object detectors can be learned from a small portion of the weakly-labeled training images and from the remaining unlabeled training images. To address such a challenging problem, this paper proposes a cross model co-training framework that collaborates an object localizer and a tag generator in an alternative optimization procedure. Specifically, during the learning procedure, these two (deep) models can transfer the needed knowledge (including labels and visual patterns) between each other. The whole learning procedure is accomplished in a few stages under the guidance of a progressive learning curriculum. To demonstrate the effectiveness of the proposed approach, we implement the comprehensive experiments on three benchmark datasets, where the obtained experimental results are quite encouraging. Notably, by using only about 15% weakly labeled training images, the proposed approach can effectively approach, or even outperform, the state-of-the-art WSOD methods.

Original language	English
Article number	8554285
Pages (from-to)	3622-3635
Number of pages	14
Journal	IEEE Transactions on Circuits and Systems for Video Technology
Volume	29
Issue number	12
DOIs	https://doi.org/10.1109/TCSVT.2018.2884173
State	Published - Dec 2019

Keywords

Computer vision
image processing
learning (artificial intelligence)
object detection

Access to Document

10.1109/TCSVT.2018.2884173

Cite this

@article{529cbee9fadf4c1ebbfa78fe5d27d45e,

title = "Learning Object Detectors With Semi-Annotated Weak Labels",

abstract = "For alleviating the human labor associated with annotating the training data for learning object detectors, recent research has focused on semi-supervised object detection (SSOD) and weakly supervised object detection (WSOD) approaches. In SSOD, instead of annotating all the instances in the whole training set, people only need to annotate the part of the training instances using bounding boxes. In WSOD, people need to annotate the image-level tags on all training images to indicate the object categories contained by the corresponding images since more detailed bounding box annotations are no longer needed. Along this line of research, this paper makes a further step to alleviate the human labor in annotating training data, leading to the problem of object detection with semi-annotated weak labels (ODSAWLs). Instead of labeling image-level tags on all training images, ODSAWL only needs the image-level tags for a small portion of the training images, and then, the object detectors can be learned from a small portion of the weakly-labeled training images and from the remaining unlabeled training images. To address such a challenging problem, this paper proposes a cross model co-training framework that collaborates an object localizer and a tag generator in an alternative optimization procedure. Specifically, during the learning procedure, these two (deep) models can transfer the needed knowledge (including labels and visual patterns) between each other. The whole learning procedure is accomplished in a few stages under the guidance of a progressive learning curriculum. To demonstrate the effectiveness of the proposed approach, we implement the comprehensive experiments on three benchmark datasets, where the obtained experimental results are quite encouraging. Notably, by using only about 15% weakly labeled training images, the proposed approach can effectively approach, or even outperform, the state-of-the-art WSOD methods.",

keywords = "Computer vision, image processing, learning (artificial intelligence), object detection",

author = "Dingwen Zhang and Junwei Han and Guangyu Guo and Long Zhao",

note = "Publisher Copyright: {\textcopyright} 1991-2012 IEEE.",

year = "2019",

month = dec,

doi = "10.1109/TCSVT.2018.2884173",

language = "英语",

volume = "29",

pages = "3622--3635",

journal = "IEEE Transactions on Circuits and Systems for Video Technology",

issn = "1051-8215",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "12",

}

TY - JOUR

T1 - Learning Object Detectors With Semi-Annotated Weak Labels

AU - Zhang, Dingwen

AU - Han, Junwei

AU - Guo, Guangyu

AU - Zhao, Long

PY - 2019/12

Y1 - 2019/12

N2 - For alleviating the human labor associated with annotating the training data for learning object detectors, recent research has focused on semi-supervised object detection (SSOD) and weakly supervised object detection (WSOD) approaches. In SSOD, instead of annotating all the instances in the whole training set, people only need to annotate the part of the training instances using bounding boxes. In WSOD, people need to annotate the image-level tags on all training images to indicate the object categories contained by the corresponding images since more detailed bounding box annotations are no longer needed. Along this line of research, this paper makes a further step to alleviate the human labor in annotating training data, leading to the problem of object detection with semi-annotated weak labels (ODSAWLs). Instead of labeling image-level tags on all training images, ODSAWL only needs the image-level tags for a small portion of the training images, and then, the object detectors can be learned from a small portion of the weakly-labeled training images and from the remaining unlabeled training images. To address such a challenging problem, this paper proposes a cross model co-training framework that collaborates an object localizer and a tag generator in an alternative optimization procedure. Specifically, during the learning procedure, these two (deep) models can transfer the needed knowledge (including labels and visual patterns) between each other. The whole learning procedure is accomplished in a few stages under the guidance of a progressive learning curriculum. To demonstrate the effectiveness of the proposed approach, we implement the comprehensive experiments on three benchmark datasets, where the obtained experimental results are quite encouraging. Notably, by using only about 15% weakly labeled training images, the proposed approach can effectively approach, or even outperform, the state-of-the-art WSOD methods.

AB - For alleviating the human labor associated with annotating the training data for learning object detectors, recent research has focused on semi-supervised object detection (SSOD) and weakly supervised object detection (WSOD) approaches. In SSOD, instead of annotating all the instances in the whole training set, people only need to annotate the part of the training instances using bounding boxes. In WSOD, people need to annotate the image-level tags on all training images to indicate the object categories contained by the corresponding images since more detailed bounding box annotations are no longer needed. Along this line of research, this paper makes a further step to alleviate the human labor in annotating training data, leading to the problem of object detection with semi-annotated weak labels (ODSAWLs). Instead of labeling image-level tags on all training images, ODSAWL only needs the image-level tags for a small portion of the training images, and then, the object detectors can be learned from a small portion of the weakly-labeled training images and from the remaining unlabeled training images. To address such a challenging problem, this paper proposes a cross model co-training framework that collaborates an object localizer and a tag generator in an alternative optimization procedure. Specifically, during the learning procedure, these two (deep) models can transfer the needed knowledge (including labels and visual patterns) between each other. The whole learning procedure is accomplished in a few stages under the guidance of a progressive learning curriculum. To demonstrate the effectiveness of the proposed approach, we implement the comprehensive experiments on three benchmark datasets, where the obtained experimental results are quite encouraging. Notably, by using only about 15% weakly labeled training images, the proposed approach can effectively approach, or even outperform, the state-of-the-art WSOD methods.

KW - Computer vision

KW - image processing

KW - learning (artificial intelligence)

KW - object detection

UR - http://www.scopus.com/inward/record.url?scp=85057806224&partnerID=8YFLogxK

U2 - 10.1109/TCSVT.2018.2884173

DO - 10.1109/TCSVT.2018.2884173

M3 - 文章

AN - SCOPUS:85057806224

SN - 1051-8215

VL - 29

SP - 3622

EP - 3635

JO - IEEE Transactions on Circuits and Systems for Video Technology

JF - IEEE Transactions on Circuits and Systems for Video Technology

IS - 12

M1 - 8554285

ER -

Learning Object Detectors With Semi-Annotated Weak Labels

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this