Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach

Wei Dong; Xing Zhang; Bihui Chen; Dawei Yan; Zhijun Lin; Qingsen Yan; Peng Wang; Yang Yang

doi:10.1109/CVPR52733.2024.01524

Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach

Wei Dong, Xing Zhang, Bihui Chen, Dawei Yan, Zhijun Lin, Qingsen Yan, Peng Wang, Yang Yang

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

2 Scopus citations

Abstract

Parameter-efficient fine-tuning for pre-trained Vision Transformers aims to adeptly tailor a model to downstream tasks by learning a minimal set of new adaptation parameters while preserving the frozen majority of pre-trained parameters. Striking a balance between retaining the generalizable representation capacity of the pre-trained model and acquiring task-specific features poses a key challenge. Currently, there is a lack of focus on guiding this delicate trade-off. In this study, we approach the problem from the perspective of Singular Value Decomposition (SVD) of pre-trained parameter matrices, providing insights into the tuning dynamics of existing methods. Building upon this understanding, we propose a Residual-based Low-Rank Rescaling (RLRR) fine-tuning strategy. This strategy not only enhances flexibility in parameter tuning but also ensures that new parameters do not deviate excessively from the pre-trained model through a residual design. Extensive experiments demonstrate that our method achieves competitive performance across various downstream image classification tasks, all while maintaining comparable new parameters. We believe this work takes a step forward in offering a unified perspective for interpreting existing methods and serves as motivation for the development of new approaches that move closer to effectively considering the crucial trade-off mentioned above. Our code is available at https://github.com/zstarN70/RLRR.git.

Original language	English
Title of host publication	Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
Publisher	IEEE Computer Society
Pages	16101-16110
Number of pages	10
ISBN (Electronic)	9798350353006
DOIs	https://doi.org/10.1109/CVPR52733.2024.01524
State	Published - 2024
Event	2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 - Seattle, United States Duration: 16 Jun 2024 → 22 Jun 2024

Publication series

Name	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN (Print)	1063-6919

Conference

Conference	2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
Country/Territory	United States
City	Seattle
Period	16/06/24 → 22/06/24

Keywords

Low-Rank Adaptation
Parameter-efficient fine-tuning;
Singular Value Decomposition;

Access to Document

10.1109/CVPR52733.2024.01524

Cite this

Dong, W., Zhang, X., Chen, B., Yan, D., Lin, Z., Yan, Q., Wang, P., & Yang, Y. (2024). Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach. In Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 (pp. 16101-16110). (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). IEEE Computer Society. https://doi.org/10.1109/CVPR52733.2024.01524

@inproceedings{9d4ce7e2bef1402c8c873a38f8391d60,

title = "Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach",

abstract = "Parameter-efficient fine-tuning for pre-trained Vision Transformers aims to adeptly tailor a model to downstream tasks by learning a minimal set of new adaptation parameters while preserving the frozen majority of pre-trained parameters. Striking a balance between retaining the generalizable representation capacity of the pre-trained model and acquiring task-specific features poses a key challenge. Currently, there is a lack of focus on guiding this delicate trade-off. In this study, we approach the problem from the perspective of Singular Value Decomposition (SVD) of pre-trained parameter matrices, providing insights into the tuning dynamics of existing methods. Building upon this understanding, we propose a Residual-based Low-Rank Rescaling (RLRR) fine-tuning strategy. This strategy not only enhances flexibility in parameter tuning but also ensures that new parameters do not deviate excessively from the pre-trained model through a residual design. Extensive experiments demonstrate that our method achieves competitive performance across various downstream image classification tasks, all while maintaining comparable new parameters. We believe this work takes a step forward in offering a unified perspective for interpreting existing methods and serves as motivation for the development of new approaches that move closer to effectively considering the crucial trade-off mentioned above. Our code is available at https://github.com/zstarN70/RLRR.git.",

keywords = "Low-Rank Adaptation, Parameter-efficient fine-tuning;, Singular Value Decomposition;",

author = "Wei Dong and Xing Zhang and Bihui Chen and Dawei Yan and Zhijun Lin and Qingsen Yan and Peng Wang and Yang Yang",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 ; Conference date: 16-06-2024 Through 22-06-2024",

year = "2024",

doi = "10.1109/CVPR52733.2024.01524",

language = "英语",

series = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

publisher = "IEEE Computer Society",

pages = "16101--16110",

booktitle = "Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024",

}

Dong, W, Zhang, X, Chen, B, Yan, D, Lin, Z, Yan, Q, Wang, P & Yang, Y 2024, Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach. in Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, pp. 16101-16110, 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, United States, 16/06/24. https://doi.org/10.1109/CVPR52733.2024.01524

Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach. / Dong, Wei; Zhang, Xing; Chen, Bihui et al.
Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024. IEEE Computer Society, 2024. p. 16101-16110 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Low-Rank Rescaled Vision Transformer Fine-Tuning

T2 - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024

AU - Dong, Wei

AU - Zhang, Xing

AU - Chen, Bihui

AU - Yan, Dawei

AU - Lin, Zhijun

AU - Yan, Qingsen

AU - Wang, Peng

AU - Yang, Yang

PY - 2024

Y1 - 2024

N2 - Parameter-efficient fine-tuning for pre-trained Vision Transformers aims to adeptly tailor a model to downstream tasks by learning a minimal set of new adaptation parameters while preserving the frozen majority of pre-trained parameters. Striking a balance between retaining the generalizable representation capacity of the pre-trained model and acquiring task-specific features poses a key challenge. Currently, there is a lack of focus on guiding this delicate trade-off. In this study, we approach the problem from the perspective of Singular Value Decomposition (SVD) of pre-trained parameter matrices, providing insights into the tuning dynamics of existing methods. Building upon this understanding, we propose a Residual-based Low-Rank Rescaling (RLRR) fine-tuning strategy. This strategy not only enhances flexibility in parameter tuning but also ensures that new parameters do not deviate excessively from the pre-trained model through a residual design. Extensive experiments demonstrate that our method achieves competitive performance across various downstream image classification tasks, all while maintaining comparable new parameters. We believe this work takes a step forward in offering a unified perspective for interpreting existing methods and serves as motivation for the development of new approaches that move closer to effectively considering the crucial trade-off mentioned above. Our code is available at https://github.com/zstarN70/RLRR.git.

AB - Parameter-efficient fine-tuning for pre-trained Vision Transformers aims to adeptly tailor a model to downstream tasks by learning a minimal set of new adaptation parameters while preserving the frozen majority of pre-trained parameters. Striking a balance between retaining the generalizable representation capacity of the pre-trained model and acquiring task-specific features poses a key challenge. Currently, there is a lack of focus on guiding this delicate trade-off. In this study, we approach the problem from the perspective of Singular Value Decomposition (SVD) of pre-trained parameter matrices, providing insights into the tuning dynamics of existing methods. Building upon this understanding, we propose a Residual-based Low-Rank Rescaling (RLRR) fine-tuning strategy. This strategy not only enhances flexibility in parameter tuning but also ensures that new parameters do not deviate excessively from the pre-trained model through a residual design. Extensive experiments demonstrate that our method achieves competitive performance across various downstream image classification tasks, all while maintaining comparable new parameters. We believe this work takes a step forward in offering a unified perspective for interpreting existing methods and serves as motivation for the development of new approaches that move closer to effectively considering the crucial trade-off mentioned above. Our code is available at https://github.com/zstarN70/RLRR.git.

KW - Low-Rank Adaptation

KW - Parameter-efficient fine-tuning;

KW - Singular Value Decomposition;

UR - http://www.scopus.com/inward/record.url?scp=85206215807&partnerID=8YFLogxK

U2 - 10.1109/CVPR52733.2024.01524

DO - 10.1109/CVPR52733.2024.01524

M3 - 会议稿件

AN - SCOPUS:85206215807

T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

SP - 16101

EP - 16110

BT - Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024

PB - IEEE Computer Society

Y2 - 16 June 2024 through 22 June 2024

ER -

Dong W, Zhang X, Chen B, Yan D, Lin Z, Yan Q et al. Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach. In Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024. IEEE Computer Society. 2024. p. 16101-16110. (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). doi: 10.1109/CVPR52733.2024.01524

Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this