SVGen: Interpretable Vector Graphics Generation with Large Language Models

  • Feiyu Wang
  • , Zhiyuan Zhao
  • , Yuandong Liu
  • , Da Zhang
  • , Junyu Gao
  • , Hao Sun
  • , Xuelong Li

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Scalable Vector Graphics (SVG) has become an indispensable technology in front-end development and UI/UX design, due to its inherent advantages in scalability, editability, and rendering efficiency. In the creation of vector graphics, while expressing creative concepts is straightforward, translating them into precise digital artworks is often challenging and time-consuming. To overcome this technical bottleneck and achieve intelligent conversion from concept to final product, we have constructed SVG-1M, a large-scale dataset of high-quality SVG samples with paired textual descriptions. Through innovative data augmentation and annotation processes, we built precisely aligned ''Text instruction-SVG code'' training pairs, with a subset enhanced by Chain-of-Thought (CoT) annotations. This provides rich semantic supervision signals for model learning. Based on this dataset, we propose SVGen, an end-to-end generative model capable of directly converting natural language descriptions into SVG code. This design addresses the challenges of generating semantically accurate vector graphics while preserving complete structural information. We explored various training strategies and introduced a progressive curriculum learning approach, optimized with reinforcement learning algorithms. Notably, this study innovatively applies the CoT paradigm to vector graphics generation, effectively enhancing both the accuracy and interpretability of SVG synthesis. Experimental validation demonstrates that SVGen exhibits significant advantages over general large models in terms of SVG generation quality, while also surpassing optimization-based rendering methods in generation efficiency. The proposed method enables intelligent conversion between natural language and vector graphics, enabling novel workflows like real-time AI-assisted design iteration. Code, model, and data is released at: https://github.com/gitcat-404/SVGen.

Original languageEnglish
Title of host publicationMM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025
PublisherAssociation for Computing Machinery, Inc
Pages9608-9617
Number of pages10
ISBN (Electronic)9798400720352
DOIs
StatePublished - 27 Oct 2025
Event33rd ACM International Conference on Multimedia, MM 2025 - Dublin, Ireland
Duration: 27 Oct 202531 Oct 2025

Publication series

NameMM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025

Conference

Conference33rd ACM International Conference on Multimedia, MM 2025
Country/TerritoryIreland
CityDublin
Period27/10/2531/10/25

Keywords

  • chain-of-thought
  • generative models
  • large language models
  • scalable vector graphics

Fingerprint

Dive into the research topics of 'SVGen: Interpretable Vector Graphics Generation with Large Language Models'. Together they form a unique fingerprint.

Cite this