Detail-Preserving and Diverse Image Translation for Adverse Visual Object Detection

Guolong Sun; Zhitong Xiong; Yuan Yuan

doi:10.1109/TCSVT.2024.3398145

Detail-Preserving and Diverse Image Translation for Adverse Visual Object Detection

Guolong Sun, Zhitong Xiong, Yuan Yuan

光电与智能研究院

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

The effectiveness of object detection is significantly hampered in challenging nighttime or rainy scenarios. This is due to the severe domain shifts between daytime and adverse-visual images. Previous methods have demonstrated that using image-to-image translation methods for data augmentation can effectively address domain shifts, but they may still fail in preserving image objects when faced with extreme adverse images like rainy nights. In addition, achieving diversity in the generated results remains challenging. To this end, we propose a Progressive Adverse Image Translation (PAIT) framework that tackles domain shifts by generating diverse and detail-preserving images. The main contributions of this paper are as follows. 1) We propose a novel PAIT framework, which incorporates an iterative mapping module and a slicing layer. This framework enables the progressive generation of increasingly challenging images in a fine-to-coarse manner. 2) To preserve the details of the images, we innovatively introduce an iterative mapping module to generate smooth style transform curves. 3) To enhance the diversity of synthesized images, a simple but efficient end-to-end optimization method is proposed. 4) We found a strong correlation between the style diversity of augmented images and the performance of the detection model through a quantitative analysis, highlighting the crucial role of style diversity in enhancing the model’s generalizability. Our framework achieves state-of-the-art performance on multiple challenging visual datasets, surpassing the current state-of-the-art methods by 27%(+8.0AP). Moreover, our approach and modules can be easily extended to different detectors and other domain adaptation methods, making it a versatile solution for object detection in adverse visual environments. Our code will be available at https://github.com/ssunguotu/Diverse-Aug.

源语言	英语
页（从-至）	9139-9152
页数	14
期刊	IEEE Transactions on Circuits and Systems for Video Technology
卷	34
期	10
DOI	https://doi.org/10.1109/TCSVT.2024.3398145
出版状态	已出版 - 2024

访问文件

10.1109/TCSVT.2024.3398145

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{3d0c84eb63f64d8a9d2952ff50637b96,

title = "Detail-Preserving and Diverse Image Translation for Adverse Visual Object Detection",

abstract = "The effectiveness of object detection is significantly hampered in challenging nighttime or rainy scenarios. This is due to the severe domain shifts between daytime and adverse-visual images. Previous methods have demonstrated that using image-to-image translation methods for data augmentation can effectively address domain shifts, but they may still fail in preserving image objects when faced with extreme adverse images like rainy nights. In addition, achieving diversity in the generated results remains challenging. To this end, we propose a Progressive Adverse Image Translation (PAIT) framework that tackles domain shifts by generating diverse and detail-preserving images. The main contributions of this paper are as follows. 1) We propose a novel PAIT framework, which incorporates an iterative mapping module and a slicing layer. This framework enables the progressive generation of increasingly challenging images in a fine-to-coarse manner. 2) To preserve the details of the images, we innovatively introduce an iterative mapping module to generate smooth style transform curves. 3) To enhance the diversity of synthesized images, a simple but efficient end-to-end optimization method is proposed. 4) We found a strong correlation between the style diversity of augmented images and the performance of the detection model through a quantitative analysis, highlighting the crucial role of style diversity in enhancing the model{\textquoteright}s generalizability. Our framework achieves state-of-the-art performance on multiple challenging visual datasets, surpassing the current state-of-the-art methods by 27%(+8.0AP). Moreover, our approach and modules can be easily extended to different detectors and other domain adaptation methods, making it a versatile solution for object detection in adverse visual environments. Our code will be available at https://github.com/ssunguotu/Diverse-Aug.",

keywords = "domain adaptation, generative adversarial network, image-to-image translation, Object detection",

author = "Guolong Sun and Zhitong Xiong and Yuan Yuan",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.",

year = "2024",

doi = "10.1109/TCSVT.2024.3398145",

language = "英语",

volume = "34",

pages = "9139--9152",

journal = "IEEE Transactions on Circuits and Systems for Video Technology",

issn = "1051-8215",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "10",

}

TY - JOUR

T1 - Detail-Preserving and Diverse Image Translation for Adverse Visual Object Detection

AU - Sun, Guolong

AU - Xiong, Zhitong

AU - Yuan, Yuan

PY - 2024

Y1 - 2024

N2 - The effectiveness of object detection is significantly hampered in challenging nighttime or rainy scenarios. This is due to the severe domain shifts between daytime and adverse-visual images. Previous methods have demonstrated that using image-to-image translation methods for data augmentation can effectively address domain shifts, but they may still fail in preserving image objects when faced with extreme adverse images like rainy nights. In addition, achieving diversity in the generated results remains challenging. To this end, we propose a Progressive Adverse Image Translation (PAIT) framework that tackles domain shifts by generating diverse and detail-preserving images. The main contributions of this paper are as follows. 1) We propose a novel PAIT framework, which incorporates an iterative mapping module and a slicing layer. This framework enables the progressive generation of increasingly challenging images in a fine-to-coarse manner. 2) To preserve the details of the images, we innovatively introduce an iterative mapping module to generate smooth style transform curves. 3) To enhance the diversity of synthesized images, a simple but efficient end-to-end optimization method is proposed. 4) We found a strong correlation between the style diversity of augmented images and the performance of the detection model through a quantitative analysis, highlighting the crucial role of style diversity in enhancing the model’s generalizability. Our framework achieves state-of-the-art performance on multiple challenging visual datasets, surpassing the current state-of-the-art methods by 27%(+8.0AP). Moreover, our approach and modules can be easily extended to different detectors and other domain adaptation methods, making it a versatile solution for object detection in adverse visual environments. Our code will be available at https://github.com/ssunguotu/Diverse-Aug.

AB - The effectiveness of object detection is significantly hampered in challenging nighttime or rainy scenarios. This is due to the severe domain shifts between daytime and adverse-visual images. Previous methods have demonstrated that using image-to-image translation methods for data augmentation can effectively address domain shifts, but they may still fail in preserving image objects when faced with extreme adverse images like rainy nights. In addition, achieving diversity in the generated results remains challenging. To this end, we propose a Progressive Adverse Image Translation (PAIT) framework that tackles domain shifts by generating diverse and detail-preserving images. The main contributions of this paper are as follows. 1) We propose a novel PAIT framework, which incorporates an iterative mapping module and a slicing layer. This framework enables the progressive generation of increasingly challenging images in a fine-to-coarse manner. 2) To preserve the details of the images, we innovatively introduce an iterative mapping module to generate smooth style transform curves. 3) To enhance the diversity of synthesized images, a simple but efficient end-to-end optimization method is proposed. 4) We found a strong correlation between the style diversity of augmented images and the performance of the detection model through a quantitative analysis, highlighting the crucial role of style diversity in enhancing the model’s generalizability. Our framework achieves state-of-the-art performance on multiple challenging visual datasets, surpassing the current state-of-the-art methods by 27%(+8.0AP). Moreover, our approach and modules can be easily extended to different detectors and other domain adaptation methods, making it a versatile solution for object detection in adverse visual environments. Our code will be available at https://github.com/ssunguotu/Diverse-Aug.

KW - domain adaptation

KW - generative adversarial network

KW - image-to-image translation

KW - Object detection

UR - http://www.scopus.com/inward/record.url?scp=85193013396&partnerID=8YFLogxK

U2 - 10.1109/TCSVT.2024.3398145

DO - 10.1109/TCSVT.2024.3398145

M3 - 文章

AN - SCOPUS:85193013396

SN - 1051-8215

VL - 34

SP - 9139

EP - 9152

JO - IEEE Transactions on Circuits and Systems for Video Technology

JF - IEEE Transactions on Circuits and Systems for Video Technology

IS - 10

ER -

Detail-Preserving and Diverse Image Translation for Adverse Visual Object Detection

摘要

访问文件

其它文件与链接

指纹

引用此