Video super-resolution via dense non-local spatial-temporal convolutional network

Wei Sun; Jinqiu Sun; Yu Zhu; Yanning Zhang

doi:10.1016/j.neucom.2020.04.039

Video super-resolution via dense non-local spatial-temporal convolutional network

Wei Sun, Jinqiu Sun, Yu Zhu, Yanning Zhang

School of Computer Science

Research output: Contribution to journal › Article › peer-review

17 Scopus citations

Abstract

In this paper, we present a novel end-to-end deep neural network for the problem of video super-resolution. In contrast to most previous methods where frames need to wrap for temporal alignment based on the estimated optical flow, we propose short-temporal and bidirectional long-temporal blocks to exploit the spatial-temporal dependencies existing in inter-frames. It can effectively model the sudden and smooth varying motions of videos and overcome the limitations of explicit motion estimation. In addition, by introducing dense feature concatenation, it provides an effective way to combine the low-level and high-level features for boosting the reconstruction of mid/high-frequency information as shown in our analysis and experiment. Furthermore, we present a region-level non-local feature enhancing structure, which captures the spatial-temporal correlations of any two positions and makes use of long-distance relevant information. Extensive evaluations and comparisons with the current state-of-the-art approaches demonstrate the effectiveness of the proposed framework.

Original language	English
Pages (from-to)	1-12
Number of pages	12
Journal	Neurocomputing
Volume	403
DOIs	https://doi.org/10.1016/j.neucom.2020.04.039
State	Published - 25 Aug 2020

Keywords

ConvLSTM
Dense concatenation
Non-local
Video super-resolution

Access to Document

10.1016/j.neucom.2020.04.039

Cite this

@article{2393959aa30e43d899447b7765f2e073,

title = "Video super-resolution via dense non-local spatial-temporal convolutional network",

abstract = "In this paper, we present a novel end-to-end deep neural network for the problem of video super-resolution. In contrast to most previous methods where frames need to wrap for temporal alignment based on the estimated optical flow, we propose short-temporal and bidirectional long-temporal blocks to exploit the spatial-temporal dependencies existing in inter-frames. It can effectively model the sudden and smooth varying motions of videos and overcome the limitations of explicit motion estimation. In addition, by introducing dense feature concatenation, it provides an effective way to combine the low-level and high-level features for boosting the reconstruction of mid/high-frequency information as shown in our analysis and experiment. Furthermore, we present a region-level non-local feature enhancing structure, which captures the spatial-temporal correlations of any two positions and makes use of long-distance relevant information. Extensive evaluations and comparisons with the current state-of-the-art approaches demonstrate the effectiveness of the proposed framework.",

keywords = "ConvLSTM, Dense concatenation, Non-local, Video super-resolution",

author = "Wei Sun and Jinqiu Sun and Yu Zhu and Yanning Zhang",

note = "Publisher Copyright: {\textcopyright} 2020 Elsevier B.V.",

year = "2020",

month = aug,

day = "25",

doi = "10.1016/j.neucom.2020.04.039",

language = "英语",

volume = "403",

pages = "1--12",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Video super-resolution via dense non-local spatial-temporal convolutional network

AU - Sun, Wei

AU - Sun, Jinqiu

AU - Zhu, Yu

AU - Zhang, Yanning

PY - 2020/8/25

Y1 - 2020/8/25

N2 - In this paper, we present a novel end-to-end deep neural network for the problem of video super-resolution. In contrast to most previous methods where frames need to wrap for temporal alignment based on the estimated optical flow, we propose short-temporal and bidirectional long-temporal blocks to exploit the spatial-temporal dependencies existing in inter-frames. It can effectively model the sudden and smooth varying motions of videos and overcome the limitations of explicit motion estimation. In addition, by introducing dense feature concatenation, it provides an effective way to combine the low-level and high-level features for boosting the reconstruction of mid/high-frequency information as shown in our analysis and experiment. Furthermore, we present a region-level non-local feature enhancing structure, which captures the spatial-temporal correlations of any two positions and makes use of long-distance relevant information. Extensive evaluations and comparisons with the current state-of-the-art approaches demonstrate the effectiveness of the proposed framework.

AB - In this paper, we present a novel end-to-end deep neural network for the problem of video super-resolution. In contrast to most previous methods where frames need to wrap for temporal alignment based on the estimated optical flow, we propose short-temporal and bidirectional long-temporal blocks to exploit the spatial-temporal dependencies existing in inter-frames. It can effectively model the sudden and smooth varying motions of videos and overcome the limitations of explicit motion estimation. In addition, by introducing dense feature concatenation, it provides an effective way to combine the low-level and high-level features for boosting the reconstruction of mid/high-frequency information as shown in our analysis and experiment. Furthermore, we present a region-level non-local feature enhancing structure, which captures the spatial-temporal correlations of any two positions and makes use of long-distance relevant information. Extensive evaluations and comparisons with the current state-of-the-art approaches demonstrate the effectiveness of the proposed framework.

KW - ConvLSTM

KW - Dense concatenation

KW - Non-local

KW - Video super-resolution

UR - http://www.scopus.com/inward/record.url?scp=85084324639&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2020.04.039

DO - 10.1016/j.neucom.2020.04.039

M3 - 文章

AN - SCOPUS:85084324639

SN - 0925-2312

VL - 403

SP - 1

EP - 12

JO - Neurocomputing

JF - Neurocomputing

ER -

Video super-resolution via dense non-local spatial-temporal convolutional network

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this