TY - JOUR
T1 - Adapt Anything
T2 - Tailor Any Image Classifier across Domains And Categories Using Text-to-Image Diffusion Models
AU - Chen, Weijie
AU - Wang, Haoyu
AU - Yang, Shicai
AU - Zhang, Lei
AU - Wei, Wei
AU - Zhang, Yanning
AU - Lin, Luojun
AU - Xie, Di
AU - Zhuang, Yueting
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2025
Y1 - 2025
N2 - We study a novel problem in this manuscript, that is, if a modern text-to-image diffusion model can tailor any image classifier across domains and categories. Existing domain adaption works exploit both source and target data for domain alignment so as to transfer the knowledge from the labeled source data to the unlabeled target data. However, as the development of text-to-image diffusion models, we wonder if the high-fidelity synthetic data can serve as a surrogate of the source data in real world. In this way, we do not need to collect and annotate the source data for each image classification task in a one-for-one manner. Instead, we utilize only one off-the-shelf text-to-image model to synthesize images with labels derived from text prompts, and then leverage them as a bridge to dig out the knowledge from the task-agnostic text-to-image generator to the task-oriented image classifier via domain adaptation. Such a one-for-all adaptation paradigm allows us to adapt anything in the world using only one text-to-image generator as well as any unlabeled target data. Extensive experiments validate the feasibility of this idea, which even surprisingly surpasses the state-of-the-art domain adaptation works using the source data collected and annotated in real world.
AB - We study a novel problem in this manuscript, that is, if a modern text-to-image diffusion model can tailor any image classifier across domains and categories. Existing domain adaption works exploit both source and target data for domain alignment so as to transfer the knowledge from the labeled source data to the unlabeled target data. However, as the development of text-to-image diffusion models, we wonder if the high-fidelity synthetic data can serve as a surrogate of the source data in real world. In this way, we do not need to collect and annotate the source data for each image classification task in a one-for-one manner. Instead, we utilize only one off-the-shelf text-to-image model to synthesize images with labels derived from text prompts, and then leverage them as a bridge to dig out the knowledge from the task-agnostic text-to-image generator to the task-oriented image classifier via domain adaptation. Such a one-for-all adaptation paradigm allows us to adapt anything in the world using only one text-to-image generator as well as any unlabeled target data. Extensive experiments validate the feasibility of this idea, which even surprisingly surpasses the state-of-the-art domain adaptation works using the source data collected and annotated in real world.
KW - Data Synthesis
KW - Prompt Diversification
KW - Text-to-Image Diffusion Models
KW - Unsupervised Domain Adaptation
UR - http://www.scopus.com/inward/record.url?scp=85217037793&partnerID=8YFLogxK
U2 - 10.1109/TBDATA.2025.3536933
DO - 10.1109/TBDATA.2025.3536933
M3 - 文章
AN - SCOPUS:85217037793
SN - 2332-7790
JO - IEEE Transactions on Big Data
JF - IEEE Transactions on Big Data
ER -