TY - JOUR
T1 - Detail-Preserving and Diverse Image Translation for Adverse Visual Object Detection
AU - Sun, Guolong
AU - Xiong, Zhitong
AU - Yuan, Yuan
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - The effectiveness of object detection is significantly hampered in challenging nighttime or rainy scenarios. This is due to the severe domain shifts between daytime and adverse-visual images. Previous methods have demonstrated that using image-to-image translation methods for data augmentation can effectively address domain shifts, but they may still fail in preserving image objects when faced with extreme adverse images like rainy nights. In addition, achieving diversity in the generated results remains challenging. To this end, we propose a Progressive Adverse Image Translation (PAIT) framework that tackles domain shifts by generating diverse and detail-preserving images. The main contributions of this paper are as follows. 1) We propose a novel PAIT framework, which incorporates an iterative mapping module and a slicing layer. This framework enables the progressive generation of increasingly challenging images in a fine-to-coarse manner. 2) To preserve the details of the images, we innovatively introduce an iterative mapping module to generate smooth style transform curves. 3) To enhance the diversity of synthesized images, a simple but efficient end-to-end optimization method is proposed. 4) We found a strong correlation between the style diversity of augmented images and the performance of the detection model through a quantitative analysis, highlighting the crucial role of style diversity in enhancing the model’s generalizability. Our framework achieves state-of-the-art performance on multiple challenging visual datasets, surpassing the current state-of-the-art methods by 27%(+8.0AP). Moreover, our approach and modules can be easily extended to different detectors and other domain adaptation methods, making it a versatile solution for object detection in adverse visual environments. Our code will be available at https://github.com/ssunguotu/Diverse-Aug.
AB - The effectiveness of object detection is significantly hampered in challenging nighttime or rainy scenarios. This is due to the severe domain shifts between daytime and adverse-visual images. Previous methods have demonstrated that using image-to-image translation methods for data augmentation can effectively address domain shifts, but they may still fail in preserving image objects when faced with extreme adverse images like rainy nights. In addition, achieving diversity in the generated results remains challenging. To this end, we propose a Progressive Adverse Image Translation (PAIT) framework that tackles domain shifts by generating diverse and detail-preserving images. The main contributions of this paper are as follows. 1) We propose a novel PAIT framework, which incorporates an iterative mapping module and a slicing layer. This framework enables the progressive generation of increasingly challenging images in a fine-to-coarse manner. 2) To preserve the details of the images, we innovatively introduce an iterative mapping module to generate smooth style transform curves. 3) To enhance the diversity of synthesized images, a simple but efficient end-to-end optimization method is proposed. 4) We found a strong correlation between the style diversity of augmented images and the performance of the detection model through a quantitative analysis, highlighting the crucial role of style diversity in enhancing the model’s generalizability. Our framework achieves state-of-the-art performance on multiple challenging visual datasets, surpassing the current state-of-the-art methods by 27%(+8.0AP). Moreover, our approach and modules can be easily extended to different detectors and other domain adaptation methods, making it a versatile solution for object detection in adverse visual environments. Our code will be available at https://github.com/ssunguotu/Diverse-Aug.
KW - domain adaptation
KW - generative adversarial network
KW - image-to-image translation
KW - Object detection
UR - http://www.scopus.com/inward/record.url?scp=85193013396&partnerID=8YFLogxK
U2 - 10.1109/TCSVT.2024.3398145
DO - 10.1109/TCSVT.2024.3398145
M3 - 文章
AN - SCOPUS:85193013396
SN - 1051-8215
VL - 34
SP - 9139
EP - 9152
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
IS - 10
ER -