TY - GEN
T1 - Any-Size-Diffusion
T2 - 38th AAAI Conference on Artificial Intelligence, AAAI 2024
AU - Zheng, Qingping
AU - Guo, Yuanfan
AU - Deng, Jiankang
AU - Han, Jianhua
AU - Li, Ying
AU - Xu, Songcen
AU - Xu, Hang
N1 - Publisher Copyright:
Copyright © 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org).All rights reserved.
PY - 2024/3/25
Y1 - 2024/3/25
N2 - Stable diffusion, a generative model used in text-to-image synthesis, frequently encounters resolution-induced composition problems when generating images of varying sizes.This issue primarily stems from the model being trained on pairs of single-scale images and their corresponding text descriptions.Moreover, direct training on images of unlimited sizes is unfeasible, as it would require an immense number of text-image pairs and entail substantial computational expenses.To overcome these challenges, we propose a two-stage pipeline named Any-Size-Diffusion (ASD), designed to efficiently generate well-composed HD images of any size, while minimizing the need for high-memory GPU resources.Specifically, the initial stage, dubbed Any Ratio Adaptability Diffusion (ARAD), leverages a selected set of images with a restricted range of ratios to optimize the text-conditional diffusion model, thereby improving its ability to adjust composition to accommodate diverse image sizes.To support the creation of images at any desired size, we further introduce a technique called Fast Seamless Tiled Diffusion (FSTD) at the subsequent stage.This method allows for the rapid enlargement of the ASD output to any high-resolution size, avoiding seaming artifacts or memory overloads.Experimental results on the LAION-COCO and MM-CelebA-HQ benchmarks show that ASD can produce well-structured images of arbitrary sizes, cutting down the inference time by 2× compared to the traditional tiled algorithm.The source code is available at https://github.com/ProAirVerse/Any-Size-Diffusion.
AB - Stable diffusion, a generative model used in text-to-image synthesis, frequently encounters resolution-induced composition problems when generating images of varying sizes.This issue primarily stems from the model being trained on pairs of single-scale images and their corresponding text descriptions.Moreover, direct training on images of unlimited sizes is unfeasible, as it would require an immense number of text-image pairs and entail substantial computational expenses.To overcome these challenges, we propose a two-stage pipeline named Any-Size-Diffusion (ASD), designed to efficiently generate well-composed HD images of any size, while minimizing the need for high-memory GPU resources.Specifically, the initial stage, dubbed Any Ratio Adaptability Diffusion (ARAD), leverages a selected set of images with a restricted range of ratios to optimize the text-conditional diffusion model, thereby improving its ability to adjust composition to accommodate diverse image sizes.To support the creation of images at any desired size, we further introduce a technique called Fast Seamless Tiled Diffusion (FSTD) at the subsequent stage.This method allows for the rapid enlargement of the ASD output to any high-resolution size, avoiding seaming artifacts or memory overloads.Experimental results on the LAION-COCO and MM-CelebA-HQ benchmarks show that ASD can produce well-structured images of arbitrary sizes, cutting down the inference time by 2× compared to the traditional tiled algorithm.The source code is available at https://github.com/ProAirVerse/Any-Size-Diffusion.
UR - http://www.scopus.com/inward/record.url?scp=85189554618&partnerID=8YFLogxK
U2 - 10.1609/aaai.v38i7.28589
DO - 10.1609/aaai.v38i7.28589
M3 - 会议稿件
AN - SCOPUS:85189554618
T3 - Proceedings of the AAAI Conference on Artificial Intelligence
SP - 7571
EP - 7578
BT - Technical Tracks 14
A2 - Wooldridge, Michael
A2 - Dy, Jennifer
A2 - Natarajan, Sriraam
PB - Association for the Advancement of Artificial Intelligence
Y2 - 20 February 2024 through 27 February 2024
ER -