Learning Bilateral Cost Volume for Rolling Shutter Temporal Super-Resolution

Bin Fan; Yuchao Dai; Hongdong Li

doi:10.1109/TPAMI.2024.3350900

Learning Bilateral Cost Volume for Rolling Shutter Temporal Super-Resolution

Bin Fan, Yuchao Dai, Hongdong Li

School of Electronics and Information

Research output: Contribution to journal › Article › peer-review

4 Scopus citations

Abstract

Rolling shutter temporal super-resolution (RSSR), which aims to synthesize intermediate global shutter (GS) video frames between two consecutive rolling shutter (RS) frames, has made remarkable progress with the development of deep convolutional neural networks over the past years. Existing methods cascade multiple separated networks to sequentially estimate intermediate motion fields and synthesize target GS frames. Nevertheless, they are typically complex, do not facilitate the interaction of complementary motion and appearance information, and suffer from problems such as pixel aliasing or poor interpretation. In this paper, we derive the uniform bilateral motion fields for RS-aware backward warping, which endows our network a more explicit geometric meaning by injecting spatio-temporal consistency information through time-offset embedding. More importantly, we develop a unified, single-stage RSSR pipeline to recover the latent GS video in a coarse-to-fine manner. It first extracts pyramid features from given inputs, and then refines the bilateral motion fields together with the anchor frame until generating the desired output. With the help of our proposed bilateral cost volume, which uses the anchor frame as a common reference to model the correlation with two RS frames, the gradually refined anchor frames not only facilitate intermediate motion estimation, but also compensate for contextual details, making additional frame synthesis or refinement networks unnecessary. Meanwhile, an asymmetric bilateral motion model built on top of the symmetric bilateral motion model further improves the generality and adaptability, yielding better GS video reconstruction performance. Extensive quantitative and qualitative experiments on synthetic and real data demonstrate that our method achieves new state-of-the-art results.

Original language	English
Article number	10382595
Pages (from-to)	3862-3879
Number of pages	18
Journal	IEEE Transactions on Pattern Analysis and Machine Intelligence
Volume	46
Issue number	5
DOIs	https://doi.org/10.1109/TPAMI.2024.3350900
State	Published - 1 May 2024

Keywords

Bilateral cost volume
deep learning
geometric vision
rolling shutter correction
temporal super-resolution

Access to Document

10.1109/TPAMI.2024.3350900

Cite this

@article{04da299808bb4805ad835e46c6001eb2,

title = "Learning Bilateral Cost Volume for Rolling Shutter Temporal Super-Resolution",

abstract = "Rolling shutter temporal super-resolution (RSSR), which aims to synthesize intermediate global shutter (GS) video frames between two consecutive rolling shutter (RS) frames, has made remarkable progress with the development of deep convolutional neural networks over the past years. Existing methods cascade multiple separated networks to sequentially estimate intermediate motion fields and synthesize target GS frames. Nevertheless, they are typically complex, do not facilitate the interaction of complementary motion and appearance information, and suffer from problems such as pixel aliasing or poor interpretation. In this paper, we derive the uniform bilateral motion fields for RS-aware backward warping, which endows our network a more explicit geometric meaning by injecting spatio-temporal consistency information through time-offset embedding. More importantly, we develop a unified, single-stage RSSR pipeline to recover the latent GS video in a coarse-to-fine manner. It first extracts pyramid features from given inputs, and then refines the bilateral motion fields together with the anchor frame until generating the desired output. With the help of our proposed bilateral cost volume, which uses the anchor frame as a common reference to model the correlation with two RS frames, the gradually refined anchor frames not only facilitate intermediate motion estimation, but also compensate for contextual details, making additional frame synthesis or refinement networks unnecessary. Meanwhile, an asymmetric bilateral motion model built on top of the symmetric bilateral motion model further improves the generality and adaptability, yielding better GS video reconstruction performance. Extensive quantitative and qualitative experiments on synthetic and real data demonstrate that our method achieves new state-of-the-art results.",

keywords = "Bilateral cost volume, deep learning, geometric vision, rolling shutter correction, temporal super-resolution",

author = "Bin Fan and Yuchao Dai and Hongdong Li",

note = "Publisher Copyright: {\textcopyright} 1979-2012 IEEE.",

year = "2024",

month = may,

day = "1",

doi = "10.1109/TPAMI.2024.3350900",

language = "英语",

volume = "46",

pages = "3862--3879",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

publisher = "IEEE Computer Society",

number = "5",

}

TY - JOUR

T1 - Learning Bilateral Cost Volume for Rolling Shutter Temporal Super-Resolution

AU - Fan, Bin

AU - Dai, Yuchao

AU - Li, Hongdong

PY - 2024/5/1

Y1 - 2024/5/1

N2 - Rolling shutter temporal super-resolution (RSSR), which aims to synthesize intermediate global shutter (GS) video frames between two consecutive rolling shutter (RS) frames, has made remarkable progress with the development of deep convolutional neural networks over the past years. Existing methods cascade multiple separated networks to sequentially estimate intermediate motion fields and synthesize target GS frames. Nevertheless, they are typically complex, do not facilitate the interaction of complementary motion and appearance information, and suffer from problems such as pixel aliasing or poor interpretation. In this paper, we derive the uniform bilateral motion fields for RS-aware backward warping, which endows our network a more explicit geometric meaning by injecting spatio-temporal consistency information through time-offset embedding. More importantly, we develop a unified, single-stage RSSR pipeline to recover the latent GS video in a coarse-to-fine manner. It first extracts pyramid features from given inputs, and then refines the bilateral motion fields together with the anchor frame until generating the desired output. With the help of our proposed bilateral cost volume, which uses the anchor frame as a common reference to model the correlation with two RS frames, the gradually refined anchor frames not only facilitate intermediate motion estimation, but also compensate for contextual details, making additional frame synthesis or refinement networks unnecessary. Meanwhile, an asymmetric bilateral motion model built on top of the symmetric bilateral motion model further improves the generality and adaptability, yielding better GS video reconstruction performance. Extensive quantitative and qualitative experiments on synthetic and real data demonstrate that our method achieves new state-of-the-art results.

AB - Rolling shutter temporal super-resolution (RSSR), which aims to synthesize intermediate global shutter (GS) video frames between two consecutive rolling shutter (RS) frames, has made remarkable progress with the development of deep convolutional neural networks over the past years. Existing methods cascade multiple separated networks to sequentially estimate intermediate motion fields and synthesize target GS frames. Nevertheless, they are typically complex, do not facilitate the interaction of complementary motion and appearance information, and suffer from problems such as pixel aliasing or poor interpretation. In this paper, we derive the uniform bilateral motion fields for RS-aware backward warping, which endows our network a more explicit geometric meaning by injecting spatio-temporal consistency information through time-offset embedding. More importantly, we develop a unified, single-stage RSSR pipeline to recover the latent GS video in a coarse-to-fine manner. It first extracts pyramid features from given inputs, and then refines the bilateral motion fields together with the anchor frame until generating the desired output. With the help of our proposed bilateral cost volume, which uses the anchor frame as a common reference to model the correlation with two RS frames, the gradually refined anchor frames not only facilitate intermediate motion estimation, but also compensate for contextual details, making additional frame synthesis or refinement networks unnecessary. Meanwhile, an asymmetric bilateral motion model built on top of the symmetric bilateral motion model further improves the generality and adaptability, yielding better GS video reconstruction performance. Extensive quantitative and qualitative experiments on synthetic and real data demonstrate that our method achieves new state-of-the-art results.

KW - Bilateral cost volume

KW - deep learning

KW - geometric vision

KW - rolling shutter correction

KW - temporal super-resolution

UR - http://www.scopus.com/inward/record.url?scp=85182387186&partnerID=8YFLogxK

U2 - 10.1109/TPAMI.2024.3350900

DO - 10.1109/TPAMI.2024.3350900

M3 - 文章

C2 - 38190689

AN - SCOPUS:85182387186

SN - 0162-8828

VL - 46

SP - 3862

EP - 3879

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

IS - 5

M1 - 10382595

ER -

Learning Bilateral Cost Volume for Rolling Shutter Temporal Super-Resolution

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this