Attention-guided dual spatial-temporal non-local network for video super-resolution

Wei Sun; Yanning Zhang

doi:10.1016/j.neucom.2020.03.068

Attention-guided dual spatial-temporal non-local network for video super-resolution

Wei Sun, Yanning Zhang

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

14 引用（Scopus）

摘要

In this paper, we propose an attention-guided dual spatial-temporal non-local network for video super-resolution (ADNLVSR). We integrate temporal and spatial non-local self-similar contexts from continuous video frames after motion compensation, and merge the features of different levels discriminatively with channel attention mechanism for target frame. During motion compensation, unlike previous methods directly stacking input images or features for merging, we use learnable attention mechanism to guide the merging, which suppresses undesired components caused by misalignment and enhances desirable fine details. During feature fusion, in contrast to most previous approaches where global-level non-local self-similarity existing in space or time is usually considered, we propose region-level spatial and temporal non-local operations for exploiting temporal correlations and enhancing similar spatial structures. The proposed modules can effectively avoid the computational burden caused by existing global-level non-local operations based on our analysis, and enhance correlated structure information. In addition, we propose a channel attention-guided residual dense block (CRDB), in which a second-order channel attention mechanism is applied to adaptively rescale the channel-wise features for more discriminative representations. Extensive experiments on different datasets demonstrate superior performance to state-of-the-art published methods on video super-resolution.

源语言	英语
页（从-至）	24-33
页数	10
期刊	Neurocomputing
卷	406
DOI	https://doi.org/10.1016/j.neucom.2020.03.068
出版状态	已出版 - 17 9月 2020

访问文件

10.1016/j.neucom.2020.03.068

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{e849f0b4a26b479b9339b57df25924ac,

title = "Attention-guided dual spatial-temporal non-local network for video super-resolution",

abstract = "In this paper, we propose an attention-guided dual spatial-temporal non-local network for video super-resolution (ADNLVSR). We integrate temporal and spatial non-local self-similar contexts from continuous video frames after motion compensation, and merge the features of different levels discriminatively with channel attention mechanism for target frame. During motion compensation, unlike previous methods directly stacking input images or features for merging, we use learnable attention mechanism to guide the merging, which suppresses undesired components caused by misalignment and enhances desirable fine details. During feature fusion, in contrast to most previous approaches where global-level non-local self-similarity existing in space or time is usually considered, we propose region-level spatial and temporal non-local operations for exploiting temporal correlations and enhancing similar spatial structures. The proposed modules can effectively avoid the computational burden caused by existing global-level non-local operations based on our analysis, and enhance correlated structure information. In addition, we propose a channel attention-guided residual dense block (CRDB), in which a second-order channel attention mechanism is applied to adaptively rescale the channel-wise features for more discriminative representations. Extensive experiments on different datasets demonstrate superior performance to state-of-the-art published methods on video super-resolution.",

keywords = "Attention, Non-local self-similarity, Residual dense block, Video super-resolution",

author = "Wei Sun and Yanning Zhang",

note = "Publisher Copyright: {\textcopyright} 2020 Elsevier B.V.",

year = "2020",

month = sep,

day = "17",

doi = "10.1016/j.neucom.2020.03.068",

language = "英语",

volume = "406",

pages = "24--33",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Attention-guided dual spatial-temporal non-local network for video super-resolution

AU - Sun, Wei

AU - Zhang, Yanning

PY - 2020/9/17

Y1 - 2020/9/17

N2 - In this paper, we propose an attention-guided dual spatial-temporal non-local network for video super-resolution (ADNLVSR). We integrate temporal and spatial non-local self-similar contexts from continuous video frames after motion compensation, and merge the features of different levels discriminatively with channel attention mechanism for target frame. During motion compensation, unlike previous methods directly stacking input images or features for merging, we use learnable attention mechanism to guide the merging, which suppresses undesired components caused by misalignment and enhances desirable fine details. During feature fusion, in contrast to most previous approaches where global-level non-local self-similarity existing in space or time is usually considered, we propose region-level spatial and temporal non-local operations for exploiting temporal correlations and enhancing similar spatial structures. The proposed modules can effectively avoid the computational burden caused by existing global-level non-local operations based on our analysis, and enhance correlated structure information. In addition, we propose a channel attention-guided residual dense block (CRDB), in which a second-order channel attention mechanism is applied to adaptively rescale the channel-wise features for more discriminative representations. Extensive experiments on different datasets demonstrate superior performance to state-of-the-art published methods on video super-resolution.

AB - In this paper, we propose an attention-guided dual spatial-temporal non-local network for video super-resolution (ADNLVSR). We integrate temporal and spatial non-local self-similar contexts from continuous video frames after motion compensation, and merge the features of different levels discriminatively with channel attention mechanism for target frame. During motion compensation, unlike previous methods directly stacking input images or features for merging, we use learnable attention mechanism to guide the merging, which suppresses undesired components caused by misalignment and enhances desirable fine details. During feature fusion, in contrast to most previous approaches where global-level non-local self-similarity existing in space or time is usually considered, we propose region-level spatial and temporal non-local operations for exploiting temporal correlations and enhancing similar spatial structures. The proposed modules can effectively avoid the computational burden caused by existing global-level non-local operations based on our analysis, and enhance correlated structure information. In addition, we propose a channel attention-guided residual dense block (CRDB), in which a second-order channel attention mechanism is applied to adaptively rescale the channel-wise features for more discriminative representations. Extensive experiments on different datasets demonstrate superior performance to state-of-the-art published methods on video super-resolution.

KW - Attention

KW - Non-local self-similarity

KW - Residual dense block

KW - Video super-resolution

UR - http://www.scopus.com/inward/record.url?scp=85084076994&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2020.03.068

DO - 10.1016/j.neucom.2020.03.068

M3 - 文章

AN - SCOPUS:85084076994

SN - 0925-2312

VL - 406

SP - 24

EP - 33

JO - Neurocomputing

JF - Neurocomputing

ER -

Attention-guided dual spatial-temporal non-local network for video super-resolution

摘要

访问文件

其它文件与链接

指纹

引用此