End-to-end video saliency detection via a deep contextual spatiotemporal network

Lina Wei, Shanshan Zhao, Omar Farouk Bourahla, Xi Li, Fei Wu, Yueting Zhuang, Junwei Han, Mingliang Xu

Research output: Contribution to journalArticlepeer-review

14 Scopus citations

Abstract

As an interesting and important problem in computer vision, learning-based video saliency detection aims to discover the visually interesting regions in a video sequence. Capturing the information within frame and between frame at different aspects (such as spatial contexts, motion information, temporal consistency across frames, and multiscale representation) is important for this task. A key issue is how to jointly model all these factors within a unified data-driven scheme in an end-to-end fashion. In this article, we propose an end-to-end spatiotemporal deep video saliency detection approach, which captures the information on spatial contexts and motion characteristics. Furthermore, it encodes the temporal consistency information across the consecutive frames by implementing a convolutional long short-term memory (Conv-LSTM) model. In addition, the multiscale saliency properties for each frame are adaptively integrated for final saliency prediction in a collaborative feature-pyramid way. Finally, the proposed deep learning approach unifies all the aforementioned parts into an end-to-end joint deep learning scheme. Experimental results demonstrate the effectiveness of our approach in comparison with the state-of-the-art approaches.

Original languageEnglish
Article number09212602
Pages (from-to)1691-1702
Number of pages12
JournalIEEE Transactions on Neural Networks and Learning Systems
Volume32
Issue number4
DOIs
StatePublished - Apr 2021

Keywords

  • End-to-end spatiotemporal context modeling
  • Motion characteristics
  • Spatial context
  • Temporal consistency
  • Video saliency detection

Fingerprint

Dive into the research topics of 'End-to-end video saliency detection via a deep contextual spatiotemporal network'. Together they form a unique fingerprint.

Cite this