Supervision by Fusion: Towards Unsupervised Learning of Deep Salient Object Detector

Dingwen Zhang; Junwei Han; Yu Zhang

doi:10.1109/ICCV.2017.436

Supervision by Fusion: Towards Unsupervised Learning of Deep Salient Object Detector

Dingwen Zhang, Junwei Han, Yu Zhang

School of Automation

Northwestern Polytechnical University Xian

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

136 Scopus citations

Abstract

In light of the powerful learning capability of deep neural networks (DNNs), deep (convolutional) models have been built in recent years to address the task of salient object detection. Although training such deep saliency models can significantly improve the detection performance, it requires large-scale manual supervision in the form of pixel-level human annotation, which is highly labor-intensive and time-consuming. To address this problem, this paper makes the earliest effort to train a deep salient object detector without using any human annotation. The key insight is 'supervision by fusion', i.e., generating useful supervisory signals from the fusion process of weak but fast unsupervised saliency models. Based on this insight, we combine an intra-image fusion stream and a inter-image fusion stream in the proposed framework to generate the learning curriculum and pseudo ground-truth for supervising the training of the deep salient object detector. Comprehensive experiments on four benchmark datasets demonstrate that our method can approach the same network trained with full supervision (within 2-5% performance gap) and, more encouragingly, even outperform a number of fully supervised state-of-the-art approaches.

Original language	English
Title of host publication	Proceedings - 2017 IEEE International Conference on Computer Vision, ICCV 2017
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	4068-4076
Number of pages	9
ISBN (Electronic)	9781538610329
DOIs	https://doi.org/10.1109/ICCV.2017.436
State	Published - 22 Dec 2017
Event	16th IEEE International Conference on Computer Vision, ICCV 2017 - Venice, Italy Duration: 22 Oct 2017 → 29 Oct 2017

Publication series

Name	Proceedings of the IEEE International Conference on Computer Vision
Volume	2017-October
ISSN (Print)	1550-5499

Conference

Conference	16th IEEE International Conference on Computer Vision, ICCV 2017
Country/Territory	Italy
City	Venice
Period	22/10/17 → 29/10/17

Access to Document

10.1109/ICCV.2017.436

Cite this

Zhang, D., Han, J., & Zhang, Y. (2017). Supervision by Fusion: Towards Unsupervised Learning of Deep Salient Object Detector. In Proceedings - 2017 IEEE International Conference on Computer Vision, ICCV 2017 (pp. 4068-4076). Article 8237698 (Proceedings of the IEEE International Conference on Computer Vision; Vol. 2017-October). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICCV.2017.436

@inproceedings{0d50c26d5b6849e2b55dc903439cb8a6,

title = "Supervision by Fusion: Towards Unsupervised Learning of Deep Salient Object Detector",

abstract = "In light of the powerful learning capability of deep neural networks (DNNs), deep (convolutional) models have been built in recent years to address the task of salient object detection. Although training such deep saliency models can significantly improve the detection performance, it requires large-scale manual supervision in the form of pixel-level human annotation, which is highly labor-intensive and time-consuming. To address this problem, this paper makes the earliest effort to train a deep salient object detector without using any human annotation. The key insight is 'supervision by fusion', i.e., generating useful supervisory signals from the fusion process of weak but fast unsupervised saliency models. Based on this insight, we combine an intra-image fusion stream and a inter-image fusion stream in the proposed framework to generate the learning curriculum and pseudo ground-truth for supervising the training of the deep salient object detector. Comprehensive experiments on four benchmark datasets demonstrate that our method can approach the same network trained with full supervision (within 2-5% performance gap) and, more encouragingly, even outperform a number of fully supervised state-of-the-art approaches.",

author = "Dingwen Zhang and Junwei Han and Yu Zhang",

note = "Publisher Copyright: {\textcopyright} 2017 IEEE.; 16th IEEE International Conference on Computer Vision, ICCV 2017 ; Conference date: 22-10-2017 Through 29-10-2017",

year = "2017",

month = dec,

day = "22",

doi = "10.1109/ICCV.2017.436",

language = "英语",

series = "Proceedings of the IEEE International Conference on Computer Vision",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "4068--4076",

booktitle = "Proceedings - 2017 IEEE International Conference on Computer Vision, ICCV 2017",

}

Zhang, D , Han, J & Zhang, Y 2017, Supervision by Fusion: Towards Unsupervised Learning of Deep Salient Object Detector. in Proceedings - 2017 IEEE International Conference on Computer Vision, ICCV 2017., 8237698, Proceedings of the IEEE International Conference on Computer Vision, vol. 2017-October, Institute of Electrical and Electronics Engineers Inc., pp. 4068-4076, 16th IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, 22/10/17. https://doi.org/10.1109/ICCV.2017.436

Supervision by Fusion: Towards Unsupervised Learning of Deep Salient Object Detector. / Zhang, Dingwen ; Han, Junwei; Zhang, Yu.
Proceedings - 2017 IEEE International Conference on Computer Vision, ICCV 2017. Institute of Electrical and Electronics Engineers Inc., 2017. p. 4068-4076 8237698 (Proceedings of the IEEE International Conference on Computer Vision; Vol. 2017-October).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Supervision by Fusion

T2 - 16th IEEE International Conference on Computer Vision, ICCV 2017

AU - Zhang, Dingwen

AU - Han, Junwei

AU - Zhang, Yu

PY - 2017/12/22

Y1 - 2017/12/22

N2 - In light of the powerful learning capability of deep neural networks (DNNs), deep (convolutional) models have been built in recent years to address the task of salient object detection. Although training such deep saliency models can significantly improve the detection performance, it requires large-scale manual supervision in the form of pixel-level human annotation, which is highly labor-intensive and time-consuming. To address this problem, this paper makes the earliest effort to train a deep salient object detector without using any human annotation. The key insight is 'supervision by fusion', i.e., generating useful supervisory signals from the fusion process of weak but fast unsupervised saliency models. Based on this insight, we combine an intra-image fusion stream and a inter-image fusion stream in the proposed framework to generate the learning curriculum and pseudo ground-truth for supervising the training of the deep salient object detector. Comprehensive experiments on four benchmark datasets demonstrate that our method can approach the same network trained with full supervision (within 2-5% performance gap) and, more encouragingly, even outperform a number of fully supervised state-of-the-art approaches.

AB - In light of the powerful learning capability of deep neural networks (DNNs), deep (convolutional) models have been built in recent years to address the task of salient object detection. Although training such deep saliency models can significantly improve the detection performance, it requires large-scale manual supervision in the form of pixel-level human annotation, which is highly labor-intensive and time-consuming. To address this problem, this paper makes the earliest effort to train a deep salient object detector without using any human annotation. The key insight is 'supervision by fusion', i.e., generating useful supervisory signals from the fusion process of weak but fast unsupervised saliency models. Based on this insight, we combine an intra-image fusion stream and a inter-image fusion stream in the proposed framework to generate the learning curriculum and pseudo ground-truth for supervising the training of the deep salient object detector. Comprehensive experiments on four benchmark datasets demonstrate that our method can approach the same network trained with full supervision (within 2-5% performance gap) and, more encouragingly, even outperform a number of fully supervised state-of-the-art approaches.

UR - http://www.scopus.com/inward/record.url?scp=85041899676&partnerID=8YFLogxK

U2 - 10.1109/ICCV.2017.436

DO - 10.1109/ICCV.2017.436

M3 - 会议稿件

AN - SCOPUS:85041899676

T3 - Proceedings of the IEEE International Conference on Computer Vision

SP - 4068

EP - 4076

BT - Proceedings - 2017 IEEE International Conference on Computer Vision, ICCV 2017

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 22 October 2017 through 29 October 2017

ER -

Zhang D , Han J, Zhang Y. Supervision by Fusion: Towards Unsupervised Learning of Deep Salient Object Detector. In Proceedings - 2017 IEEE International Conference on Computer Vision, ICCV 2017. Institute of Electrical and Electronics Engineers Inc. 2017. p. 4068-4076. 8237698. (Proceedings of the IEEE International Conference on Computer Vision). doi: 10.1109/ICCV.2017.436

Supervision by Fusion: Towards Unsupervised Learning of Deep Salient Object Detector

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this