跳到主要导航 跳到搜索 跳到主要内容

Exploring Rich and Efficient Spatial Temporal Interactions for Real-Time Video Salient Object Detection

  • Chenglizhao Chen
  • , Guotao Wang
  • , Chong Peng
  • , Yuming Fang
  • , Dingwen Zhang
  • , Hong Qin
  • Qingdao University
  • Jiangxi University of Finance and Economics
  • Xidian University
  • Stony Brook University

科研成果: 期刊稿件文章同行评审

126 引用 (Scopus)

摘要

We have witnessed a growing interest in video salient object detection (VSOD) techniques in today's computer vision applications. In contrast with temporal information (which is still considered a rather unstable source thus far), the spatial information is more stable and ubiquitous, thus it could influence our vision system more. As a result, the current main-stream VSOD approaches have inferred and obtained their saliency primarily from the spatial perspective, still treating temporal information as subordinate. Although the aforementioned methodology of focusing on the spatial aspect is effective in achieving a numeric performance gain, it still has two critical limitations. First, to ensure the dominance by the spatial information, its temporal counterpart remains inadequately used, though in some complex video scenes, the temporal information may represent the only reliable data source, which is critical to derive the correct VSOD. Second, both spatial and temporal saliency cues are often computed independently in advance and then integrated later on, while the interactions between them are omitted completely, resulting in saliency cues with limited quality. To combat these challenges, this paper advocates a novel spatiotemporal network, where the key innovation is the design of its temporal unit. Compared with other existing competitors (e.g., convLSTM), the proposed temporal unit exhibits an extremely lightweight design that does not degrade its strong ability to sense temporal information. Furthermore, it fully enables the computation of temporal saliency cues that interact with their spatial counterparts, ultimately boosting the overall VSOD performance and realizing its full potential towards mutual performance improvement for each. The proposed method is easy to implement yet still effective, achieving high-quality VSOD at 50 FPS in real-Time applications.

源语言英语
文章编号9390381
页(从-至)3995-4007
页数13
期刊IEEE Transactions on Image Processing
30
DOI
出版状态已出版 - 2021
已对外发布

指纹

探究 'Exploring Rich and Efficient Spatial Temporal Interactions for Real-Time Video Salient Object Detection' 的科研主题。它们共同构成独一无二的指纹。

引用此