TY - GEN
T1 - Predicting eye fixations using convolutional neural networks
AU - Liu, Nian
AU - Han, Junwei
AU - Zhang, Dingwen
AU - Wen, Shifeng
AU - Liu, Tianming
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/10/14
Y1 - 2015/10/14
N2 - It is believed that eye movements in free-viewing of natural scenes are directed by both bottom-up visual saliency and top-down visual factors. In this paper, we propose a novel computational framework to simultaneously learn these two types of visual features from raw image data using a multiresolution convolutional neural network (Mr-CNN) for predicting eye fixations. The Mr-CNN is directly trained from image regions centered on fixation and non-fixation locations over multiple resolutions, using raw image pixels as inputs and eye fixation attributes as labels. Diverse top-down visual features can be learned in higher layers. Meanwhile bottom-up visual saliency can also be inferred via combining information over multiple resolutions. Finally, optimal integration of bottom-up and top-down cues can be learned in the last logistic regression layer to predict eye fixations. The proposed approach achieves state-of-the-art results over four publically available benchmark datasets, demonstrating the superiority of our work.
AB - It is believed that eye movements in free-viewing of natural scenes are directed by both bottom-up visual saliency and top-down visual factors. In this paper, we propose a novel computational framework to simultaneously learn these two types of visual features from raw image data using a multiresolution convolutional neural network (Mr-CNN) for predicting eye fixations. The Mr-CNN is directly trained from image regions centered on fixation and non-fixation locations over multiple resolutions, using raw image pixels as inputs and eye fixation attributes as labels. Diverse top-down visual features can be learned in higher layers. Meanwhile bottom-up visual saliency can also be inferred via combining information over multiple resolutions. Finally, optimal integration of bottom-up and top-down cues can be learned in the last logistic regression layer to predict eye fixations. The proposed approach achieves state-of-the-art results over four publically available benchmark datasets, demonstrating the superiority of our work.
UR - http://www.scopus.com/inward/record.url?scp=84946554818&partnerID=8YFLogxK
U2 - 10.1109/CVPR.2015.7298633
DO - 10.1109/CVPR.2015.7298633
M3 - 会议稿件
AN - SCOPUS:84946554818
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 362
EP - 370
BT - IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015
PB - IEEE Computer Society
T2 - IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015
Y2 - 7 June 2015 through 12 June 2015
ER -