Adapt Anything: Tailor Any Image Classifier across Domains And Categories Using Text-to-Image Diffusion Models

Weijie Chen; Haoyu Wang; Shicai Yang; Lei Zhang; Wei Wei; Yanning Zhang; Luojun Lin; Di Xie; Yueting Zhuang

doi:10.1109/TBDATA.2025.3536933

Adapt Anything: Tailor Any Image Classifier across Domains And Categories Using Text-to-Image Diffusion Models

Weijie Chen, Haoyu Wang, Shicai Yang, Lei Zhang, Wei Wei, Yanning Zhang, Luojun Lin, Di Xie, Yueting Zhuang

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

We study a novel problem in this manuscript, that is, if a modern text-to-image diffusion model can tailor any image classifier across domains and categories. Existing domain adaption works exploit both source and target data for domain alignment so as to transfer the knowledge from the labeled source data to the unlabeled target data. However, as the development of text-to-image diffusion models, we wonder if the high-fidelity synthetic data can serve as a surrogate of the source data in real world. In this way, we do not need to collect and annotate the source data for each image classification task in a one-for-one manner. Instead, we utilize only one off-the-shelf text-to-image model to synthesize images with labels derived from text prompts, and then leverage them as a bridge to dig out the knowledge from the task-agnostic text-to-image generator to the task-oriented image classifier via domain adaptation. Such a one-for-all adaptation paradigm allows us to adapt anything in the world using only one text-to-image generator as well as any unlabeled target data. Extensive experiments validate the feasibility of this idea, which even surprisingly surpasses the state-of-the-art domain adaptation works using the source data collected and annotated in real world.

源语言	英语
期刊	IEEE Transactions on Big Data
DOI	https://doi.org/10.1109/TBDATA.2025.3536933
出版状态	已接受/待刊 - 2025

访问文件

10.1109/TBDATA.2025.3536933

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{834a161d6a50499fbe78791aba8d0603,

title = "Adapt Anything: Tailor Any Image Classifier across Domains And Categories Using Text-to-Image Diffusion Models",

abstract = "We study a novel problem in this manuscript, that is, if a modern text-to-image diffusion model can tailor any image classifier across domains and categories. Existing domain adaption works exploit both source and target data for domain alignment so as to transfer the knowledge from the labeled source data to the unlabeled target data. However, as the development of text-to-image diffusion models, we wonder if the high-fidelity synthetic data can serve as a surrogate of the source data in real world. In this way, we do not need to collect and annotate the source data for each image classification task in a one-for-one manner. Instead, we utilize only one off-the-shelf text-to-image model to synthesize images with labels derived from text prompts, and then leverage them as a bridge to dig out the knowledge from the task-agnostic text-to-image generator to the task-oriented image classifier via domain adaptation. Such a one-for-all adaptation paradigm allows us to adapt anything in the world using only one text-to-image generator as well as any unlabeled target data. Extensive experiments validate the feasibility of this idea, which even surprisingly surpasses the state-of-the-art domain adaptation works using the source data collected and annotated in real world.",

keywords = "Data Synthesis, Prompt Diversification, Text-to-Image Diffusion Models, Unsupervised Domain Adaptation",

author = "Weijie Chen and Haoyu Wang and Shicai Yang and Lei Zhang and Wei Wei and Yanning Zhang and Luojun Lin and Di Xie and Yueting Zhuang",

note = "Publisher Copyright: {\textcopyright} 2015 IEEE.",

year = "2025",

doi = "10.1109/TBDATA.2025.3536933",

language = "英语",

journal = "IEEE Transactions on Big Data",

issn = "2332-7790",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Adapt Anything

T2 - Tailor Any Image Classifier across Domains And Categories Using Text-to-Image Diffusion Models

AU - Chen, Weijie

AU - Wang, Haoyu

AU - Yang, Shicai

AU - Zhang, Lei

AU - Wei, Wei

AU - Zhang, Yanning

AU - Lin, Luojun

AU - Xie, Di

AU - Zhuang, Yueting

PY - 2025

Y1 - 2025

N2 - We study a novel problem in this manuscript, that is, if a modern text-to-image diffusion model can tailor any image classifier across domains and categories. Existing domain adaption works exploit both source and target data for domain alignment so as to transfer the knowledge from the labeled source data to the unlabeled target data. However, as the development of text-to-image diffusion models, we wonder if the high-fidelity synthetic data can serve as a surrogate of the source data in real world. In this way, we do not need to collect and annotate the source data for each image classification task in a one-for-one manner. Instead, we utilize only one off-the-shelf text-to-image model to synthesize images with labels derived from text prompts, and then leverage them as a bridge to dig out the knowledge from the task-agnostic text-to-image generator to the task-oriented image classifier via domain adaptation. Such a one-for-all adaptation paradigm allows us to adapt anything in the world using only one text-to-image generator as well as any unlabeled target data. Extensive experiments validate the feasibility of this idea, which even surprisingly surpasses the state-of-the-art domain adaptation works using the source data collected and annotated in real world.

AB - We study a novel problem in this manuscript, that is, if a modern text-to-image diffusion model can tailor any image classifier across domains and categories. Existing domain adaption works exploit both source and target data for domain alignment so as to transfer the knowledge from the labeled source data to the unlabeled target data. However, as the development of text-to-image diffusion models, we wonder if the high-fidelity synthetic data can serve as a surrogate of the source data in real world. In this way, we do not need to collect and annotate the source data for each image classification task in a one-for-one manner. Instead, we utilize only one off-the-shelf text-to-image model to synthesize images with labels derived from text prompts, and then leverage them as a bridge to dig out the knowledge from the task-agnostic text-to-image generator to the task-oriented image classifier via domain adaptation. Such a one-for-all adaptation paradigm allows us to adapt anything in the world using only one text-to-image generator as well as any unlabeled target data. Extensive experiments validate the feasibility of this idea, which even surprisingly surpasses the state-of-the-art domain adaptation works using the source data collected and annotated in real world.

KW - Data Synthesis

KW - Prompt Diversification

KW - Text-to-Image Diffusion Models

KW - Unsupervised Domain Adaptation

UR - http://www.scopus.com/inward/record.url?scp=85217037793&partnerID=8YFLogxK

U2 - 10.1109/TBDATA.2025.3536933

DO - 10.1109/TBDATA.2025.3536933

M3 - 文章

AN - SCOPUS:85217037793

SN - 2332-7790

JO - IEEE Transactions on Big Data

JF - IEEE Transactions on Big Data

ER -

Adapt Anything: Tailor Any Image Classifier across Domains And Categories Using Text-to-Image Diffusion Models

摘要

访问文件

其它文件与链接

指纹

引用此