End-to-end video saliency detection via a deep contextual spatiotemporal network

Lina Wei; Shanshan Zhao; Omar Farouk Bourahla; Xi Li; Fei Wu; Yueting Zhuang; Junwei Han; Mingliang Xu

doi:10.1109/TNNLS.2020.2986823

End-to-end video saliency detection via a deep contextual spatiotemporal network

Lina Wei, Shanshan Zhao, Omar Farouk Bourahla, Xi Li, Fei Wu, Yueting Zhuang, Junwei Han, Mingliang Xu

School of Automation

Research output: Contribution to journal › Article › peer-review

14 Scopus citations

Abstract

As an interesting and important problem in computer vision, learning-based video saliency detection aims to discover the visually interesting regions in a video sequence. Capturing the information within frame and between frame at different aspects (such as spatial contexts, motion information, temporal consistency across frames, and multiscale representation) is important for this task. A key issue is how to jointly model all these factors within a unified data-driven scheme in an end-to-end fashion. In this article, we propose an end-to-end spatiotemporal deep video saliency detection approach, which captures the information on spatial contexts and motion characteristics. Furthermore, it encodes the temporal consistency information across the consecutive frames by implementing a convolutional long short-term memory (Conv-LSTM) model. In addition, the multiscale saliency properties for each frame are adaptively integrated for final saliency prediction in a collaborative feature-pyramid way. Finally, the proposed deep learning approach unifies all the aforementioned parts into an end-to-end joint deep learning scheme. Experimental results demonstrate the effectiveness of our approach in comparison with the state-of-the-art approaches.

Original language	English
Article number	09212602
Pages (from-to)	1691-1702
Number of pages	12
Journal	IEEE Transactions on Neural Networks and Learning Systems
Volume	32
Issue number	4
DOIs	https://doi.org/10.1109/TNNLS.2020.2986823
State	Published - Apr 2021

Keywords

End-to-end spatiotemporal context modeling
Motion characteristics
Spatial context
Temporal consistency
Video saliency detection

Access to Document

10.1109/TNNLS.2020.2986823

Cite this

@article{a7adabc0aed14ec5ac677d59f9cf0864,

title = "End-to-end video saliency detection via a deep contextual spatiotemporal network",

abstract = "As an interesting and important problem in computer vision, learning-based video saliency detection aims to discover the visually interesting regions in a video sequence. Capturing the information within frame and between frame at different aspects (such as spatial contexts, motion information, temporal consistency across frames, and multiscale representation) is important for this task. A key issue is how to jointly model all these factors within a unified data-driven scheme in an end-to-end fashion. In this article, we propose an end-to-end spatiotemporal deep video saliency detection approach, which captures the information on spatial contexts and motion characteristics. Furthermore, it encodes the temporal consistency information across the consecutive frames by implementing a convolutional long short-term memory (Conv-LSTM) model. In addition, the multiscale saliency properties for each frame are adaptively integrated for final saliency prediction in a collaborative feature-pyramid way. Finally, the proposed deep learning approach unifies all the aforementioned parts into an end-to-end joint deep learning scheme. Experimental results demonstrate the effectiveness of our approach in comparison with the state-of-the-art approaches.",

keywords = "End-to-end spatiotemporal context modeling, Motion characteristics, Spatial context, Temporal consistency, Video saliency detection",

author = "Lina Wei and Shanshan Zhao and Bourahla, {Omar Farouk} and Xi Li and Fei Wu and Yueting Zhuang and Junwei Han and Mingliang Xu",

note = "Publisher Copyright: {\textcopyright} 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.",

year = "2021",

month = apr,

doi = "10.1109/TNNLS.2020.2986823",

language = "英语",

volume = "32",

pages = "1691--1702",

journal = "IEEE Transactions on Neural Networks and Learning Systems",

issn = "2162-237X",

publisher = "IEEE Computational Intelligence Society",

number = "4",

}

TY - JOUR

T1 - End-to-end video saliency detection via a deep contextual spatiotemporal network

AU - Wei, Lina

AU - Zhao, Shanshan

AU - Bourahla, Omar Farouk

AU - Li, Xi

AU - Wu, Fei

AU - Zhuang, Yueting

AU - Han, Junwei

AU - Xu, Mingliang

N1 - Publisher Copyright: © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

PY - 2021/4

Y1 - 2021/4

N2 - As an interesting and important problem in computer vision, learning-based video saliency detection aims to discover the visually interesting regions in a video sequence. Capturing the information within frame and between frame at different aspects (such as spatial contexts, motion information, temporal consistency across frames, and multiscale representation) is important for this task. A key issue is how to jointly model all these factors within a unified data-driven scheme in an end-to-end fashion. In this article, we propose an end-to-end spatiotemporal deep video saliency detection approach, which captures the information on spatial contexts and motion characteristics. Furthermore, it encodes the temporal consistency information across the consecutive frames by implementing a convolutional long short-term memory (Conv-LSTM) model. In addition, the multiscale saliency properties for each frame are adaptively integrated for final saliency prediction in a collaborative feature-pyramid way. Finally, the proposed deep learning approach unifies all the aforementioned parts into an end-to-end joint deep learning scheme. Experimental results demonstrate the effectiveness of our approach in comparison with the state-of-the-art approaches.

AB - As an interesting and important problem in computer vision, learning-based video saliency detection aims to discover the visually interesting regions in a video sequence. Capturing the information within frame and between frame at different aspects (such as spatial contexts, motion information, temporal consistency across frames, and multiscale representation) is important for this task. A key issue is how to jointly model all these factors within a unified data-driven scheme in an end-to-end fashion. In this article, we propose an end-to-end spatiotemporal deep video saliency detection approach, which captures the information on spatial contexts and motion characteristics. Furthermore, it encodes the temporal consistency information across the consecutive frames by implementing a convolutional long short-term memory (Conv-LSTM) model. In addition, the multiscale saliency properties for each frame are adaptively integrated for final saliency prediction in a collaborative feature-pyramid way. Finally, the proposed deep learning approach unifies all the aforementioned parts into an end-to-end joint deep learning scheme. Experimental results demonstrate the effectiveness of our approach in comparison with the state-of-the-art approaches.

KW - End-to-end spatiotemporal context modeling

KW - Motion characteristics

KW - Spatial context

KW - Temporal consistency

KW - Video saliency detection

UR - http://www.scopus.com/inward/record.url?scp=85103919463&partnerID=8YFLogxK

U2 - 10.1109/TNNLS.2020.2986823

DO - 10.1109/TNNLS.2020.2986823

M3 - 文章

C2 - 33017291

AN - SCOPUS:85103919463

SN - 2162-237X

VL - 32

SP - 1691

EP - 1702

JO - IEEE Transactions on Neural Networks and Learning Systems

JF - IEEE Transactions on Neural Networks and Learning Systems

IS - 4

M1 - 09212602

ER -

End-to-end video saliency detection via a deep contextual spatiotemporal network

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this