Learning Video Salient Object Detection Progressively from Unlabeled Videos

Binwei Xu, Qiuping Jiang, Haoran Liang, Dingwen Zhang, Ronghua Liang, Peng Chen

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Recently, deep learning-based video salient object detection (VSOD) has achieved some breakthroughs, but these methods rely on expensive annotated videos with pixel-wise annotations or weak annotations. In this paper, based on the similarities and differences between VSOD and image salient object detection (SOD), we propose a novel VSOD method via a progressive framework that locates and segments salient objects in sequence without utilizing any video annotation. To efficiently use the knowledge learned in the SOD dataset for VSOD efficiently, we introduce dynamic saliency to compensate for the lack of motion information of SOD during the locating process while maintaining the same fine segmenting process. Specifically, we utilize the coarse locating model trained on the image dataset, to identify frames with both static and dynamic saliency. Locating results of these frames are selected as spatiotemporal location labels. Moreover, by tracking salient objects in adjacent frames, the number of spatiotemporal location labels is increased. On the basis of these location labels, a two-stream locating network with an optical flow branch is proposed to capture salient objects in videos. The results with respect to five public benchmarks demonstrate that our method outperforms the state-of-the-art weakly and unsupervised methods.

Original languageEnglish
JournalIEEE Transactions on Multimedia
DOIs
StateAccepted/In press - 2024

Keywords

  • Location
  • Optical flow
  • Segmentation
  • Video salient object detection
  • Weakly supervised learning

Fingerprint

Dive into the research topics of 'Learning Video Salient Object Detection Progressively from Unlabeled Videos'. Together they form a unique fingerprint.

Cite this