Cascaded transformer U-net for image restoration

Longbin Yan; Min Zhao; Shumin Liu; Shuaikai Shi; Jie Chen

doi:10.1016/j.sigpro.2022.108902

Cascaded transformer U-net for image restoration

Longbin Yan, Min Zhao, Shumin Liu, Shuaikai Shi, Jie Chen

School of Marine Science and Technology

Research output: Contribution to journal › Article › peer-review

25 Scopus citations

Abstract

Image restoration is one of the most important computer vision tasks, aiming at recovering high-quality images from degraded or low-quality observations. The restoration methods based on convolutional neural networks (CNNs) have achieved attractive performance, however, as convolutions only intake local information, CNN-based methods have limitations in modeling objects in long ranges and extracting global information. In addition, existing one-stage methods damage the performance due to lacking diversified receptive fields. In this paper, we propose a multi-stage cascaded transformer architecture for image restoration. Firstly, the Swin transformer based encoder relying on self-attention is used to improve the modeling ability for long-range objects and outputs hierarchical multi-level semantic features. Then, a shape perceiving module is designed and embedded in the decoder to enhance the representation of irregular objects, Moreover, a multi-stage cascaded encoder-decoder architecture possessing diversified receptive fields is proposed to progressively obtain fine restoration results and thus boost the performance. We conduct extensive experiments, including image deraining, underwater image enhancement, near infrared image colorization and low-light image enhancement. The results show that our proposed method can achieve comparable or better performance than state-of-the-art methods while with less training and inference costs.

Original language	English
Article number	108902
Journal	Signal Processing
Volume	206
DOIs	https://doi.org/10.1016/j.sigpro.2022.108902
State	Published - May 2023

Keywords

Encoder-decoder structure
Image deraining
Long-range dependence modeling
Near infrared image colorization
Underwater image enhancement

Access to Document

10.1016/j.sigpro.2022.108902

Cite this

@article{6617300b0cb74708b0790de3f0d42905,

title = "Cascaded transformer U-net for image restoration",

abstract = "Image restoration is one of the most important computer vision tasks, aiming at recovering high-quality images from degraded or low-quality observations. The restoration methods based on convolutional neural networks (CNNs) have achieved attractive performance, however, as convolutions only intake local information, CNN-based methods have limitations in modeling objects in long ranges and extracting global information. In addition, existing one-stage methods damage the performance due to lacking diversified receptive fields. In this paper, we propose a multi-stage cascaded transformer architecture for image restoration. Firstly, the Swin transformer based encoder relying on self-attention is used to improve the modeling ability for long-range objects and outputs hierarchical multi-level semantic features. Then, a shape perceiving module is designed and embedded in the decoder to enhance the representation of irregular objects, Moreover, a multi-stage cascaded encoder-decoder architecture possessing diversified receptive fields is proposed to progressively obtain fine restoration results and thus boost the performance. We conduct extensive experiments, including image deraining, underwater image enhancement, near infrared image colorization and low-light image enhancement. The results show that our proposed method can achieve comparable or better performance than state-of-the-art methods while with less training and inference costs.",

keywords = "Encoder-decoder structure, Image deraining, Long-range dependence modeling, Near infrared image colorization, Underwater image enhancement",

author = "Longbin Yan and Min Zhao and Shumin Liu and Shuaikai Shi and Jie Chen",

note = "Publisher Copyright: {\textcopyright} 2022",

year = "2023",

month = may,

doi = "10.1016/j.sigpro.2022.108902",

language = "英语",

volume = "206",

journal = "Signal Processing",

issn = "0165-1684",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Cascaded transformer U-net for image restoration

AU - Yan, Longbin

AU - Zhao, Min

AU - Liu, Shumin

AU - Shi, Shuaikai

AU - Chen, Jie

PY - 2023/5

Y1 - 2023/5

N2 - Image restoration is one of the most important computer vision tasks, aiming at recovering high-quality images from degraded or low-quality observations. The restoration methods based on convolutional neural networks (CNNs) have achieved attractive performance, however, as convolutions only intake local information, CNN-based methods have limitations in modeling objects in long ranges and extracting global information. In addition, existing one-stage methods damage the performance due to lacking diversified receptive fields. In this paper, we propose a multi-stage cascaded transformer architecture for image restoration. Firstly, the Swin transformer based encoder relying on self-attention is used to improve the modeling ability for long-range objects and outputs hierarchical multi-level semantic features. Then, a shape perceiving module is designed and embedded in the decoder to enhance the representation of irregular objects, Moreover, a multi-stage cascaded encoder-decoder architecture possessing diversified receptive fields is proposed to progressively obtain fine restoration results and thus boost the performance. We conduct extensive experiments, including image deraining, underwater image enhancement, near infrared image colorization and low-light image enhancement. The results show that our proposed method can achieve comparable or better performance than state-of-the-art methods while with less training and inference costs.

AB - Image restoration is one of the most important computer vision tasks, aiming at recovering high-quality images from degraded or low-quality observations. The restoration methods based on convolutional neural networks (CNNs) have achieved attractive performance, however, as convolutions only intake local information, CNN-based methods have limitations in modeling objects in long ranges and extracting global information. In addition, existing one-stage methods damage the performance due to lacking diversified receptive fields. In this paper, we propose a multi-stage cascaded transformer architecture for image restoration. Firstly, the Swin transformer based encoder relying on self-attention is used to improve the modeling ability for long-range objects and outputs hierarchical multi-level semantic features. Then, a shape perceiving module is designed and embedded in the decoder to enhance the representation of irregular objects, Moreover, a multi-stage cascaded encoder-decoder architecture possessing diversified receptive fields is proposed to progressively obtain fine restoration results and thus boost the performance. We conduct extensive experiments, including image deraining, underwater image enhancement, near infrared image colorization and low-light image enhancement. The results show that our proposed method can achieve comparable or better performance than state-of-the-art methods while with less training and inference costs.

KW - Encoder-decoder structure

KW - Image deraining

KW - Long-range dependence modeling

KW - Near infrared image colorization

KW - Underwater image enhancement

UR - http://www.scopus.com/inward/record.url?scp=85145973736&partnerID=8YFLogxK

U2 - 10.1016/j.sigpro.2022.108902

DO - 10.1016/j.sigpro.2022.108902

M3 - 文章

AN - SCOPUS:85145973736

SN - 0165-1684

VL - 206

JO - Signal Processing

JF - Signal Processing

M1 - 108902

ER -

Cascaded transformer U-net for image restoration

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this