Attention-guided video super-resolution with recurrent multi-scale spatial–temporal transformer

Wei Sun; Xianguang Kong; Yanning Zhang

doi:10.1007/s40747-022-00944-x

Attention-guided video super-resolution with recurrent multi-scale spatial–temporal transformer

Wei Sun, Xianguang Kong, Yanning Zhang

School of Computer Science

Research output: Contribution to journal › Article › peer-review

2 Scopus citations

Abstract

Video super-resolution (VSR) aims to recover the high-resolution (HR) contents from the low-resolution (LR) observations relying on compositing the spatial–temporal information in the LR frames. It is crucial to propagate and aggregate spatial–temporal information. Recently, while transformers show impressive performance on high-level vision tasks, few attempts have been made on image restoration, especially on VSR. In addition, previous transformers simultaneously process spatial–temporal information, easily synthesizing confused textures and high computational cost limit its development. Towards this end, we construct a novel bidirectional recurrent VSR architecture. Our model disentangles the task of learning spatial–temporal information into two easier sub-tasks, each sub-task focuses on propagating and aggregating specific information with a multi-scale transformer-based design, which alleviates the difficulty of learning. Additionally, an attention-guided motion compensation module is applied to get rid of the influence of misalignment between frames. Experiments on three widely used benchmark datasets show that, relying on superior feature correlation learning, the proposed network can outperform previous state-of-the-art methods, especially for recovering the fine details.

Original language	English
Pages (from-to)	3989-4002
Number of pages	14
Journal	Complex and Intelligent Systems
Volume	9
Issue number	4
DOIs	https://doi.org/10.1007/s40747-022-00944-x
State	Published - Aug 2023

Keywords

Attention mechanism
Motion compensation
Spatial-temporal transformer
Video super-resolution

Access to Document

10.1007/s40747-022-00944-x

Cite this

@article{103299d4cf7c4dc19317672da15fe598,

title = "Attention-guided video super-resolution with recurrent multi-scale spatial–temporal transformer",

abstract = "Video super-resolution (VSR) aims to recover the high-resolution (HR) contents from the low-resolution (LR) observations relying on compositing the spatial–temporal information in the LR frames. It is crucial to propagate and aggregate spatial–temporal information. Recently, while transformers show impressive performance on high-level vision tasks, few attempts have been made on image restoration, especially on VSR. In addition, previous transformers simultaneously process spatial–temporal information, easily synthesizing confused textures and high computational cost limit its development. Towards this end, we construct a novel bidirectional recurrent VSR architecture. Our model disentangles the task of learning spatial–temporal information into two easier sub-tasks, each sub-task focuses on propagating and aggregating specific information with a multi-scale transformer-based design, which alleviates the difficulty of learning. Additionally, an attention-guided motion compensation module is applied to get rid of the influence of misalignment between frames. Experiments on three widely used benchmark datasets show that, relying on superior feature correlation learning, the proposed network can outperform previous state-of-the-art methods, especially for recovering the fine details.",

keywords = "Attention mechanism, Motion compensation, Spatial-temporal transformer, Video super-resolution",

author = "Wei Sun and Xianguang Kong and Yanning Zhang",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s).",

year = "2023",

month = aug,

doi = "10.1007/s40747-022-00944-x",

language = "英语",

volume = "9",

pages = "3989--4002",

journal = "Complex and Intelligent Systems",

issn = "2199-4536",

publisher = "Springer International Publishing AG",

number = "4",

}

TY - JOUR

T1 - Attention-guided video super-resolution with recurrent multi-scale spatial–temporal transformer

AU - Sun, Wei

AU - Kong, Xianguang

AU - Zhang, Yanning

PY - 2023/8

Y1 - 2023/8

N2 - Video super-resolution (VSR) aims to recover the high-resolution (HR) contents from the low-resolution (LR) observations relying on compositing the spatial–temporal information in the LR frames. It is crucial to propagate and aggregate spatial–temporal information. Recently, while transformers show impressive performance on high-level vision tasks, few attempts have been made on image restoration, especially on VSR. In addition, previous transformers simultaneously process spatial–temporal information, easily synthesizing confused textures and high computational cost limit its development. Towards this end, we construct a novel bidirectional recurrent VSR architecture. Our model disentangles the task of learning spatial–temporal information into two easier sub-tasks, each sub-task focuses on propagating and aggregating specific information with a multi-scale transformer-based design, which alleviates the difficulty of learning. Additionally, an attention-guided motion compensation module is applied to get rid of the influence of misalignment between frames. Experiments on three widely used benchmark datasets show that, relying on superior feature correlation learning, the proposed network can outperform previous state-of-the-art methods, especially for recovering the fine details.

AB - Video super-resolution (VSR) aims to recover the high-resolution (HR) contents from the low-resolution (LR) observations relying on compositing the spatial–temporal information in the LR frames. It is crucial to propagate and aggregate spatial–temporal information. Recently, while transformers show impressive performance on high-level vision tasks, few attempts have been made on image restoration, especially on VSR. In addition, previous transformers simultaneously process spatial–temporal information, easily synthesizing confused textures and high computational cost limit its development. Towards this end, we construct a novel bidirectional recurrent VSR architecture. Our model disentangles the task of learning spatial–temporal information into two easier sub-tasks, each sub-task focuses on propagating and aggregating specific information with a multi-scale transformer-based design, which alleviates the difficulty of learning. Additionally, an attention-guided motion compensation module is applied to get rid of the influence of misalignment between frames. Experiments on three widely used benchmark datasets show that, relying on superior feature correlation learning, the proposed network can outperform previous state-of-the-art methods, especially for recovering the fine details.

KW - Attention mechanism

KW - Motion compensation

KW - Spatial-temporal transformer

KW - Video super-resolution

UR - http://www.scopus.com/inward/record.url?scp=85144025937&partnerID=8YFLogxK

U2 - 10.1007/s40747-022-00944-x

DO - 10.1007/s40747-022-00944-x

M3 - 文章

AN - SCOPUS:85144025937

SN - 2199-4536

VL - 9

SP - 3989

EP - 4002

JO - Complex and Intelligent Systems

JF - Complex and Intelligent Systems

IS - 4

ER -

Attention-guided video super-resolution with recurrent multi-scale spatial–temporal transformer

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this