Revisiting the Power of Prompt for Visual Tuning

Yuzhu Wang; Lechao Cheng; Chaowei Fang; Dingwen Zhang; Manni Duan; Meng Wang

Revisiting the Power of Prompt for Visual Tuning

Yuzhu Wang, Lechao Cheng, Chaowei Fang, Dingwen Zhang, Manni Duan, Meng Wang

自动化学院

科研成果: 期刊稿件 › 会议文章 › 同行评审

摘要

Visual prompt tuning (VPT) is a promising solution incorporating learnable prompt tokens to customize pre-trained models for downstream tasks. However, VPT and its variants often encounter challenges like prompt initialization, prompt length, and subpar performance in self-supervised pretraining, hindering successful contextual adaptation. This study commences by exploring the correlation evolvement between prompts and patch tokens during proficient training. Inspired by the observation that the prompt tokens tend to share high mutual information with patch tokens, we propose initializing prompts with downstream token prototypes. The strategic initialization, a stand-in for the previous initialization, substantially improves performance. To refine further, we optimize token construction with a streamlined pipeline that maintains excellent performance with almost no increase in computational expenses compared to VPT. Exhaustive experiments show our proposed approach outperforms existing methods by a remarkable margin. For instance, after MAE pre-training, our method improves accuracy by up to 10%∼30% compared to VPT, and outperforms Full fine-tuning 19 out of 24 cases while using less than 0.4% of learnable parameters. Besides, the experimental results demonstrate the proposed SPT is robust to prompt lengths and scales well with model capacity and training data size. We finally provide an insightful exploration into the amount of target data facilitating the adaptation of pre-trained models to downstream tasks. The code is available at https://github.com/WangYZ1608/Self-PromptTuning.

源语言	英语
页（从-至）	50233-50247
页数	15
期刊	Proceedings of Machine Learning Research
卷	235
出版状态	已出版 - 2024
活动	41st International Conference on Machine Learning, ICML 2024 - Vienna, 奥地利期限: 21 7月 2024 → 27 7月 2024

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{7a463f0b3eb64af5b0d951ad5bd83ce4,

title = "Revisiting the Power of Prompt for Visual Tuning",

abstract = "Visual prompt tuning (VPT) is a promising solution incorporating learnable prompt tokens to customize pre-trained models for downstream tasks. However, VPT and its variants often encounter challenges like prompt initialization, prompt length, and subpar performance in self-supervised pretraining, hindering successful contextual adaptation. This study commences by exploring the correlation evolvement between prompts and patch tokens during proficient training. Inspired by the observation that the prompt tokens tend to share high mutual information with patch tokens, we propose initializing prompts with downstream token prototypes. The strategic initialization, a stand-in for the previous initialization, substantially improves performance. To refine further, we optimize token construction with a streamlined pipeline that maintains excellent performance with almost no increase in computational expenses compared to VPT. Exhaustive experiments show our proposed approach outperforms existing methods by a remarkable margin. For instance, after MAE pre-training, our method improves accuracy by up to 10%∼30% compared to VPT, and outperforms Full fine-tuning 19 out of 24 cases while using less than 0.4% of learnable parameters. Besides, the experimental results demonstrate the proposed SPT is robust to prompt lengths and scales well with model capacity and training data size. We finally provide an insightful exploration into the amount of target data facilitating the adaptation of pre-trained models to downstream tasks. The code is available at https://github.com/WangYZ1608/Self-PromptTuning.",

author = "Yuzhu Wang and Lechao Cheng and Chaowei Fang and Dingwen Zhang and Manni Duan and Meng Wang",

year = "2024",

language = "英语",

volume = "235",

pages = "50233--50247",

journal = "Proceedings of Machine Learning Research",

issn = "2640-3498",

publisher = "ML Research Press",

}

TY - JOUR

T1 - Revisiting the Power of Prompt for Visual Tuning

AU - Wang, Yuzhu

AU - Cheng, Lechao

AU - Fang, Chaowei

AU - Zhang, Dingwen

AU - Duan, Manni

AU - Wang, Meng

PY - 2024

Y1 - 2024

N2 - Visual prompt tuning (VPT) is a promising solution incorporating learnable prompt tokens to customize pre-trained models for downstream tasks. However, VPT and its variants often encounter challenges like prompt initialization, prompt length, and subpar performance in self-supervised pretraining, hindering successful contextual adaptation. This study commences by exploring the correlation evolvement between prompts and patch tokens during proficient training. Inspired by the observation that the prompt tokens tend to share high mutual information with patch tokens, we propose initializing prompts with downstream token prototypes. The strategic initialization, a stand-in for the previous initialization, substantially improves performance. To refine further, we optimize token construction with a streamlined pipeline that maintains excellent performance with almost no increase in computational expenses compared to VPT. Exhaustive experiments show our proposed approach outperforms existing methods by a remarkable margin. For instance, after MAE pre-training, our method improves accuracy by up to 10%∼30% compared to VPT, and outperforms Full fine-tuning 19 out of 24 cases while using less than 0.4% of learnable parameters. Besides, the experimental results demonstrate the proposed SPT is robust to prompt lengths and scales well with model capacity and training data size. We finally provide an insightful exploration into the amount of target data facilitating the adaptation of pre-trained models to downstream tasks. The code is available at https://github.com/WangYZ1608/Self-PromptTuning.

AB - Visual prompt tuning (VPT) is a promising solution incorporating learnable prompt tokens to customize pre-trained models for downstream tasks. However, VPT and its variants often encounter challenges like prompt initialization, prompt length, and subpar performance in self-supervised pretraining, hindering successful contextual adaptation. This study commences by exploring the correlation evolvement between prompts and patch tokens during proficient training. Inspired by the observation that the prompt tokens tend to share high mutual information with patch tokens, we propose initializing prompts with downstream token prototypes. The strategic initialization, a stand-in for the previous initialization, substantially improves performance. To refine further, we optimize token construction with a streamlined pipeline that maintains excellent performance with almost no increase in computational expenses compared to VPT. Exhaustive experiments show our proposed approach outperforms existing methods by a remarkable margin. For instance, after MAE pre-training, our method improves accuracy by up to 10%∼30% compared to VPT, and outperforms Full fine-tuning 19 out of 24 cases while using less than 0.4% of learnable parameters. Besides, the experimental results demonstrate the proposed SPT is robust to prompt lengths and scales well with model capacity and training data size. We finally provide an insightful exploration into the amount of target data facilitating the adaptation of pre-trained models to downstream tasks. The code is available at https://github.com/WangYZ1608/Self-PromptTuning.

UR - http://www.scopus.com/inward/record.url?scp=85203791631&partnerID=8YFLogxK

M3 - 会议文章

AN - SCOPUS:85203791631

SN - 2640-3498

VL - 235

SP - 50233

EP - 50247

JO - Proceedings of Machine Learning Research

JF - Proceedings of Machine Learning Research

T2 - 41st International Conference on Machine Learning, ICML 2024

Y2 - 21 July 2024 through 27 July 2024

ER -

Revisiting the Power of Prompt for Visual Tuning

摘要

其它文件与链接

指纹

引用此