Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images

Qingping Zheng; Yuanfan Guo; Jiankang Deng; Jianhua Han; Ying Li; Songcen Xu; Hang Xu

doi:10.1609/aaai.v38i7.28589

Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images

Qingping Zheng, Yuanfan Guo, Jiankang Deng, Jianhua Han, Ying Li, Songcen Xu, Hang Xu

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

6 引用（Scopus）

摘要

Stable diffusion, a generative model used in text-to-image synthesis, frequently encounters resolution-induced composition problems when generating images of varying sizes.This issue primarily stems from the model being trained on pairs of single-scale images and their corresponding text descriptions.Moreover, direct training on images of unlimited sizes is unfeasible, as it would require an immense number of text-image pairs and entail substantial computational expenses.To overcome these challenges, we propose a two-stage pipeline named Any-Size-Diffusion (ASD), designed to efficiently generate well-composed HD images of any size, while minimizing the need for high-memory GPU resources.Specifically, the initial stage, dubbed Any Ratio Adaptability Diffusion (ARAD), leverages a selected set of images with a restricted range of ratios to optimize the text-conditional diffusion model, thereby improving its ability to adjust composition to accommodate diverse image sizes.To support the creation of images at any desired size, we further introduce a technique called Fast Seamless Tiled Diffusion (FSTD) at the subsequent stage.This method allows for the rapid enlargement of the ASD output to any high-resolution size, avoiding seaming artifacts or memory overloads.Experimental results on the LAION-COCO and MM-CelebA-HQ benchmarks show that ASD can produce well-structured images of arbitrary sizes, cutting down the inference time by 2× compared to the traditional tiled algorithm.The source code is available at https://github.com/ProAirVerse/Any-Size-Diffusion.

源语言	英语
主期刊名	Technical Tracks 14
编辑	Michael Wooldridge, Jennifer Dy, Sriraam Natarajan
出版商	Association for the Advancement of Artificial Intelligence
页	7571-7578
页数	8
版本	7
ISBN（电子版）	1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879
DOI	https://doi.org/10.1609/aaai.v38i7.28589
出版状态	已出版 - 25 3月 2024
活动	38th AAAI Conference on Artificial Intelligence, AAAI 2024 - Vancouver, 加拿大期限: 20 2月 2024 → 27 2月 2024

出版系列

姓名	Proceedings of the AAAI Conference on Artificial Intelligence
编号	7
卷	38
ISSN（印刷版）	2159-5399
ISSN（电子版）	2374-3468

会议

会议	38th AAAI Conference on Artificial Intelligence, AAAI 2024
国家/地区	加拿大
市	Vancouver
时期	20/02/24 → 27/02/24

访问文件

10.1609/aaai.v38i7.28589

其它文件与链接

链接到 Scopus 的出版物

引用此

Zheng, Q., Guo, Y., Deng, J., Han, J., Li, Y., Xu, S., & Xu, H. (2024). Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images. 在 M. Wooldridge, J. Dy, & S. Natarajan (编辑), Technical Tracks 14 (7 编辑, 页码 7571-7578). (Proceedings of the AAAI Conference on Artificial Intelligence; 卷 38, 号码 7). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v38i7.28589

Zheng, Qingping ; Guo, Yuanfan ; Deng, Jiankang 等. / Any-Size-Diffusion : Toward Efficient Text-Driven Synthesis for Any-Size HD Images. Technical Tracks 14. 编辑 / Michael Wooldridge ; Jennifer Dy ; Sriraam Natarajan. 7. 编辑 Association for the Advancement of Artificial Intelligence, 2024. 页码 7571-7578 (Proceedings of the AAAI Conference on Artificial Intelligence; 7).

@inproceedings{18f0059216d8475d843b622a80d2af90,

title = "Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images",

abstract = "Stable diffusion, a generative model used in text-to-image synthesis, frequently encounters resolution-induced composition problems when generating images of varying sizes.This issue primarily stems from the model being trained on pairs of single-scale images and their corresponding text descriptions.Moreover, direct training on images of unlimited sizes is unfeasible, as it would require an immense number of text-image pairs and entail substantial computational expenses.To overcome these challenges, we propose a two-stage pipeline named Any-Size-Diffusion (ASD), designed to efficiently generate well-composed HD images of any size, while minimizing the need for high-memory GPU resources.Specifically, the initial stage, dubbed Any Ratio Adaptability Diffusion (ARAD), leverages a selected set of images with a restricted range of ratios to optimize the text-conditional diffusion model, thereby improving its ability to adjust composition to accommodate diverse image sizes.To support the creation of images at any desired size, we further introduce a technique called Fast Seamless Tiled Diffusion (FSTD) at the subsequent stage.This method allows for the rapid enlargement of the ASD output to any high-resolution size, avoiding seaming artifacts or memory overloads.Experimental results on the LAION-COCO and MM-CelebA-HQ benchmarks show that ASD can produce well-structured images of arbitrary sizes, cutting down the inference time by 2× compared to the traditional tiled algorithm.The source code is available at https://github.com/ProAirVerse/Any-Size-Diffusion.",

author = "Qingping Zheng and Yuanfan Guo and Jiankang Deng and Jianhua Han and Ying Li and Songcen Xu and Hang Xu",

note = "Publisher Copyright: Copyright {\textcopyright} 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org).All rights reserved.; 38th AAAI Conference on Artificial Intelligence, AAAI 2024 ; Conference date: 20-02-2024 Through 27-02-2024",

year = "2024",

month = mar,

day = "25",

doi = "10.1609/aaai.v38i7.28589",

language = "英语",

series = "Proceedings of the AAAI Conference on Artificial Intelligence",

publisher = "Association for the Advancement of Artificial Intelligence",

number = "7",

pages = "7571--7578",

editor = "Michael Wooldridge and Jennifer Dy and Sriraam Natarajan",

booktitle = "Technical Tracks 14",

edition = "7",

}

Zheng, Q, Guo, Y, Deng, J, Han, J, Li, Y, Xu, S & Xu, H 2024, Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images. 在 M Wooldridge, J Dy & S Natarajan (编辑), Technical Tracks 14. 7 编辑, Proceedings of the AAAI Conference on Artificial Intelligence, 号码 7, 卷 38, Association for the Advancement of Artificial Intelligence, 页码 7571-7578, 38th AAAI Conference on Artificial Intelligence, AAAI 2024, Vancouver, 加拿大, 20/02/24. https://doi.org/10.1609/aaai.v38i7.28589

Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images. / Zheng, Qingping; Guo, Yuanfan; Deng, Jiankang 等.
Technical Tracks 14. 编辑 / Michael Wooldridge; Jennifer Dy; Sriraam Natarajan. 7. 编辑 Association for the Advancement of Artificial Intelligence, 2024. 页码 7571-7578 (Proceedings of the AAAI Conference on Artificial Intelligence; 卷 38, 号码 7).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Any-Size-Diffusion

T2 - 38th AAAI Conference on Artificial Intelligence, AAAI 2024

AU - Zheng, Qingping

AU - Guo, Yuanfan

AU - Deng, Jiankang

AU - Han, Jianhua

AU - Li, Ying

AU - Xu, Songcen

AU - Xu, Hang

PY - 2024/3/25

Y1 - 2024/3/25

N2 - Stable diffusion, a generative model used in text-to-image synthesis, frequently encounters resolution-induced composition problems when generating images of varying sizes.This issue primarily stems from the model being trained on pairs of single-scale images and their corresponding text descriptions.Moreover, direct training on images of unlimited sizes is unfeasible, as it would require an immense number of text-image pairs and entail substantial computational expenses.To overcome these challenges, we propose a two-stage pipeline named Any-Size-Diffusion (ASD), designed to efficiently generate well-composed HD images of any size, while minimizing the need for high-memory GPU resources.Specifically, the initial stage, dubbed Any Ratio Adaptability Diffusion (ARAD), leverages a selected set of images with a restricted range of ratios to optimize the text-conditional diffusion model, thereby improving its ability to adjust composition to accommodate diverse image sizes.To support the creation of images at any desired size, we further introduce a technique called Fast Seamless Tiled Diffusion (FSTD) at the subsequent stage.This method allows for the rapid enlargement of the ASD output to any high-resolution size, avoiding seaming artifacts or memory overloads.Experimental results on the LAION-COCO and MM-CelebA-HQ benchmarks show that ASD can produce well-structured images of arbitrary sizes, cutting down the inference time by 2× compared to the traditional tiled algorithm.The source code is available at https://github.com/ProAirVerse/Any-Size-Diffusion.

AB - Stable diffusion, a generative model used in text-to-image synthesis, frequently encounters resolution-induced composition problems when generating images of varying sizes.This issue primarily stems from the model being trained on pairs of single-scale images and their corresponding text descriptions.Moreover, direct training on images of unlimited sizes is unfeasible, as it would require an immense number of text-image pairs and entail substantial computational expenses.To overcome these challenges, we propose a two-stage pipeline named Any-Size-Diffusion (ASD), designed to efficiently generate well-composed HD images of any size, while minimizing the need for high-memory GPU resources.Specifically, the initial stage, dubbed Any Ratio Adaptability Diffusion (ARAD), leverages a selected set of images with a restricted range of ratios to optimize the text-conditional diffusion model, thereby improving its ability to adjust composition to accommodate diverse image sizes.To support the creation of images at any desired size, we further introduce a technique called Fast Seamless Tiled Diffusion (FSTD) at the subsequent stage.This method allows for the rapid enlargement of the ASD output to any high-resolution size, avoiding seaming artifacts or memory overloads.Experimental results on the LAION-COCO and MM-CelebA-HQ benchmarks show that ASD can produce well-structured images of arbitrary sizes, cutting down the inference time by 2× compared to the traditional tiled algorithm.The source code is available at https://github.com/ProAirVerse/Any-Size-Diffusion.

UR - http://www.scopus.com/inward/record.url?scp=85189554618&partnerID=8YFLogxK

U2 - 10.1609/aaai.v38i7.28589

DO - 10.1609/aaai.v38i7.28589

M3 - 会议稿件

AN - SCOPUS:85189554618

T3 - Proceedings of the AAAI Conference on Artificial Intelligence

SP - 7571

EP - 7578

BT - Technical Tracks 14

A2 - Wooldridge, Michael

A2 - Dy, Jennifer

A2 - Natarajan, Sriraam

PB - Association for the Advancement of Artificial Intelligence

Y2 - 20 February 2024 through 27 February 2024

ER -

Zheng Q, Guo Y, Deng J, Han J, Li Y, Xu S 等. Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images. 在 Wooldridge M, Dy J, Natarajan S, 编辑, Technical Tracks 14. 7 编辑 Association for the Advancement of Artificial Intelligence. 2024. 页码 7571-7578. (Proceedings of the AAAI Conference on Artificial Intelligence; 7). doi: 10.1609/aaai.v38i7.28589

Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此