TY - GEN
T1 - Supervision by Fusion
T2 - 16th IEEE International Conference on Computer Vision, ICCV 2017
AU - Zhang, Dingwen
AU - Han, Junwei
AU - Zhang, Yu
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/12/22
Y1 - 2017/12/22
N2 - In light of the powerful learning capability of deep neural networks (DNNs), deep (convolutional) models have been built in recent years to address the task of salient object detection. Although training such deep saliency models can significantly improve the detection performance, it requires large-scale manual supervision in the form of pixel-level human annotation, which is highly labor-intensive and time-consuming. To address this problem, this paper makes the earliest effort to train a deep salient object detector without using any human annotation. The key insight is 'supervision by fusion', i.e., generating useful supervisory signals from the fusion process of weak but fast unsupervised saliency models. Based on this insight, we combine an intra-image fusion stream and a inter-image fusion stream in the proposed framework to generate the learning curriculum and pseudo ground-truth for supervising the training of the deep salient object detector. Comprehensive experiments on four benchmark datasets demonstrate that our method can approach the same network trained with full supervision (within 2-5% performance gap) and, more encouragingly, even outperform a number of fully supervised state-of-the-art approaches.
AB - In light of the powerful learning capability of deep neural networks (DNNs), deep (convolutional) models have been built in recent years to address the task of salient object detection. Although training such deep saliency models can significantly improve the detection performance, it requires large-scale manual supervision in the form of pixel-level human annotation, which is highly labor-intensive and time-consuming. To address this problem, this paper makes the earliest effort to train a deep salient object detector without using any human annotation. The key insight is 'supervision by fusion', i.e., generating useful supervisory signals from the fusion process of weak but fast unsupervised saliency models. Based on this insight, we combine an intra-image fusion stream and a inter-image fusion stream in the proposed framework to generate the learning curriculum and pseudo ground-truth for supervising the training of the deep salient object detector. Comprehensive experiments on four benchmark datasets demonstrate that our method can approach the same network trained with full supervision (within 2-5% performance gap) and, more encouragingly, even outperform a number of fully supervised state-of-the-art approaches.
UR - http://www.scopus.com/inward/record.url?scp=85041899676&partnerID=8YFLogxK
U2 - 10.1109/ICCV.2017.436
DO - 10.1109/ICCV.2017.436
M3 - 会议稿件
AN - SCOPUS:85041899676
T3 - Proceedings of the IEEE International Conference on Computer Vision
SP - 4068
EP - 4076
BT - Proceedings - 2017 IEEE International Conference on Computer Vision, ICCV 2017
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 22 October 2017 through 29 October 2017
ER -