Synthesizing Supervision for Learning Deep Saliency Network without Human Annotation

Dingwen Zhang; Junwei Han; Yu Zhang; Dong Xu

doi:10.1109/TPAMI.2019.2900649

Synthesizing Supervision for Learning Deep Saliency Network without Human Annotation

Dingwen Zhang, Junwei Han, Yu Zhang, Dong Xu

自动化学院

科研成果: 期刊稿件 › 文章 › 同行评审

93 引用（Scopus）

摘要

Recently, the research field of salient object detection is undergoing a rapid and remarkable development along with the wide usage of deep neural networks. Being trained with a large number of images annotated with strong pixel-level ground-truth masks, the deep salient object detectors have achieved the state-of-the-art performance. However, it is expensive and time-consuming to provide the pixel-level ground-truth masks for each training image. To address this problem, this paper proposes one of the earliest frameworks to learn deep salient object detectors without requiring any human annotation. The supervisory signals used in our learning framework are generated through a novel supervision synthesis scheme, in which the key insights are 'knowledge source transition' and 'supervision by fusion'. Specifically, in the proposed learning framework, both the external knowledge source and the internal knowledge source are explored dynamically to provide informative cues for synthesizing supervision required in our approach, while a two-stream fusion mechanism is also established to implement the supervision synthesis process. Comprehensive experiments on four benchmark datasets demonstrate that the deep salient object detector trained by our newly proposed learning framework often works well without requiring any human annotated masks, which even approaches to its upper-bound obtained under the fully supervised learning fashion (within only 3 percent performance gap). Besides, we also apply the salient object detector learnt with our annotation-free learning framework to assist the weakly supervised semantic segmentation task, which demonstrates that our approach can also alleviate the heavy supplementary supervision required in the existing weakly supervised semantic segmentation framework.

源语言	英语
文章编号	8645692
页（从-至）	1755-1769
页数	15
期刊	IEEE Transactions on Pattern Analysis and Machine Intelligence
卷	42
期	7
DOI	https://doi.org/10.1109/TPAMI.2019.2900649
出版状态	已出版 - 1 7月 2020

访问文件

10.1109/TPAMI.2019.2900649

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{e7a302cd8c6b4d37883cfab0ce598250,

title = "Synthesizing Supervision for Learning Deep Saliency Network without Human Annotation",

abstract = "Recently, the research field of salient object detection is undergoing a rapid and remarkable development along with the wide usage of deep neural networks. Being trained with a large number of images annotated with strong pixel-level ground-truth masks, the deep salient object detectors have achieved the state-of-the-art performance. However, it is expensive and time-consuming to provide the pixel-level ground-truth masks for each training image. To address this problem, this paper proposes one of the earliest frameworks to learn deep salient object detectors without requiring any human annotation. The supervisory signals used in our learning framework are generated through a novel supervision synthesis scheme, in which the key insights are 'knowledge source transition' and 'supervision by fusion'. Specifically, in the proposed learning framework, both the external knowledge source and the internal knowledge source are explored dynamically to provide informative cues for synthesizing supervision required in our approach, while a two-stream fusion mechanism is also established to implement the supervision synthesis process. Comprehensive experiments on four benchmark datasets demonstrate that the deep salient object detector trained by our newly proposed learning framework often works well without requiring any human annotated masks, which even approaches to its upper-bound obtained under the fully supervised learning fashion (within only 3 percent performance gap). Besides, we also apply the salient object detector learnt with our annotation-free learning framework to assist the weakly supervised semantic segmentation task, which demonstrates that our approach can also alleviate the heavy supplementary supervision required in the existing weakly supervised semantic segmentation framework.",

keywords = "Salient object detection, annotation-free, supervision synthesis, weakly supervised semantic segmentation",

author = "Dingwen Zhang and Junwei Han and Yu Zhang and Dong Xu",

note = "Publisher Copyright: {\textcopyright} 1979-2012 IEEE.",

year = "2020",

month = jul,

day = "1",

doi = "10.1109/TPAMI.2019.2900649",

language = "英语",

volume = "42",

pages = "1755--1769",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

publisher = "IEEE Computer Society",

number = "7",

}

TY - JOUR

T1 - Synthesizing Supervision for Learning Deep Saliency Network without Human Annotation

AU - Zhang, Dingwen

AU - Han, Junwei

AU - Zhang, Yu

AU - Xu, Dong

PY - 2020/7/1

Y1 - 2020/7/1

N2 - Recently, the research field of salient object detection is undergoing a rapid and remarkable development along with the wide usage of deep neural networks. Being trained with a large number of images annotated with strong pixel-level ground-truth masks, the deep salient object detectors have achieved the state-of-the-art performance. However, it is expensive and time-consuming to provide the pixel-level ground-truth masks for each training image. To address this problem, this paper proposes one of the earliest frameworks to learn deep salient object detectors without requiring any human annotation. The supervisory signals used in our learning framework are generated through a novel supervision synthesis scheme, in which the key insights are 'knowledge source transition' and 'supervision by fusion'. Specifically, in the proposed learning framework, both the external knowledge source and the internal knowledge source are explored dynamically to provide informative cues for synthesizing supervision required in our approach, while a two-stream fusion mechanism is also established to implement the supervision synthesis process. Comprehensive experiments on four benchmark datasets demonstrate that the deep salient object detector trained by our newly proposed learning framework often works well without requiring any human annotated masks, which even approaches to its upper-bound obtained under the fully supervised learning fashion (within only 3 percent performance gap). Besides, we also apply the salient object detector learnt with our annotation-free learning framework to assist the weakly supervised semantic segmentation task, which demonstrates that our approach can also alleviate the heavy supplementary supervision required in the existing weakly supervised semantic segmentation framework.

AB - Recently, the research field of salient object detection is undergoing a rapid and remarkable development along with the wide usage of deep neural networks. Being trained with a large number of images annotated with strong pixel-level ground-truth masks, the deep salient object detectors have achieved the state-of-the-art performance. However, it is expensive and time-consuming to provide the pixel-level ground-truth masks for each training image. To address this problem, this paper proposes one of the earliest frameworks to learn deep salient object detectors without requiring any human annotation. The supervisory signals used in our learning framework are generated through a novel supervision synthesis scheme, in which the key insights are 'knowledge source transition' and 'supervision by fusion'. Specifically, in the proposed learning framework, both the external knowledge source and the internal knowledge source are explored dynamically to provide informative cues for synthesizing supervision required in our approach, while a two-stream fusion mechanism is also established to implement the supervision synthesis process. Comprehensive experiments on four benchmark datasets demonstrate that the deep salient object detector trained by our newly proposed learning framework often works well without requiring any human annotated masks, which even approaches to its upper-bound obtained under the fully supervised learning fashion (within only 3 percent performance gap). Besides, we also apply the salient object detector learnt with our annotation-free learning framework to assist the weakly supervised semantic segmentation task, which demonstrates that our approach can also alleviate the heavy supplementary supervision required in the existing weakly supervised semantic segmentation framework.

KW - Salient object detection

KW - annotation-free

KW - supervision synthesis

KW - weakly supervised semantic segmentation

UR - http://www.scopus.com/inward/record.url?scp=85086060874&partnerID=8YFLogxK

U2 - 10.1109/TPAMI.2019.2900649

DO - 10.1109/TPAMI.2019.2900649

M3 - 文章

C2 - 30794509

AN - SCOPUS:85086060874

SN - 0162-8828

VL - 42

SP - 1755

EP - 1769

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

IS - 7

M1 - 8645692

ER -

Synthesizing Supervision for Learning Deep Saliency Network without Human Annotation

摘要

访问文件

其它文件与链接

指纹

引用此