TY - GEN
T1 - SVGen
T2 - 33rd ACM International Conference on Multimedia, MM 2025
AU - Wang, Feiyu
AU - Zhao, Zhiyuan
AU - Liu, Yuandong
AU - Zhang, Da
AU - Gao, Junyu
AU - Sun, Hao
AU - Li, Xuelong
N1 - Publisher Copyright:
© 2025 ACM.
PY - 2025/10/27
Y1 - 2025/10/27
N2 - Scalable Vector Graphics (SVG) has become an indispensable technology in front-end development and UI/UX design, due to its inherent advantages in scalability, editability, and rendering efficiency. In the creation of vector graphics, while expressing creative concepts is straightforward, translating them into precise digital artworks is often challenging and time-consuming. To overcome this technical bottleneck and achieve intelligent conversion from concept to final product, we have constructed SVG-1M, a large-scale dataset of high-quality SVG samples with paired textual descriptions. Through innovative data augmentation and annotation processes, we built precisely aligned ''Text instruction-SVG code'' training pairs, with a subset enhanced by Chain-of-Thought (CoT) annotations. This provides rich semantic supervision signals for model learning. Based on this dataset, we propose SVGen, an end-to-end generative model capable of directly converting natural language descriptions into SVG code. This design addresses the challenges of generating semantically accurate vector graphics while preserving complete structural information. We explored various training strategies and introduced a progressive curriculum learning approach, optimized with reinforcement learning algorithms. Notably, this study innovatively applies the CoT paradigm to vector graphics generation, effectively enhancing both the accuracy and interpretability of SVG synthesis. Experimental validation demonstrates that SVGen exhibits significant advantages over general large models in terms of SVG generation quality, while also surpassing optimization-based rendering methods in generation efficiency. The proposed method enables intelligent conversion between natural language and vector graphics, enabling novel workflows like real-time AI-assisted design iteration. Code, model, and data is released at: https://github.com/gitcat-404/SVGen.
AB - Scalable Vector Graphics (SVG) has become an indispensable technology in front-end development and UI/UX design, due to its inherent advantages in scalability, editability, and rendering efficiency. In the creation of vector graphics, while expressing creative concepts is straightforward, translating them into precise digital artworks is often challenging and time-consuming. To overcome this technical bottleneck and achieve intelligent conversion from concept to final product, we have constructed SVG-1M, a large-scale dataset of high-quality SVG samples with paired textual descriptions. Through innovative data augmentation and annotation processes, we built precisely aligned ''Text instruction-SVG code'' training pairs, with a subset enhanced by Chain-of-Thought (CoT) annotations. This provides rich semantic supervision signals for model learning. Based on this dataset, we propose SVGen, an end-to-end generative model capable of directly converting natural language descriptions into SVG code. This design addresses the challenges of generating semantically accurate vector graphics while preserving complete structural information. We explored various training strategies and introduced a progressive curriculum learning approach, optimized with reinforcement learning algorithms. Notably, this study innovatively applies the CoT paradigm to vector graphics generation, effectively enhancing both the accuracy and interpretability of SVG synthesis. Experimental validation demonstrates that SVGen exhibits significant advantages over general large models in terms of SVG generation quality, while also surpassing optimization-based rendering methods in generation efficiency. The proposed method enables intelligent conversion between natural language and vector graphics, enabling novel workflows like real-time AI-assisted design iteration. Code, model, and data is released at: https://github.com/gitcat-404/SVGen.
KW - chain-of-thought
KW - generative models
KW - large language models
KW - scalable vector graphics
UR - https://www.scopus.com/pages/publications/105024068507
U2 - 10.1145/3746027.3755011
DO - 10.1145/3746027.3755011
M3 - 会议稿件
AN - SCOPUS:105024068507
T3 - MM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025
SP - 9608
EP - 9617
BT - MM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025
PB - Association for Computing Machinery, Inc
Y2 - 27 October 2025 through 31 October 2025
ER -