Efficient Adaptation of Pre-trained Vision Transformer via Householder Transformation

Wei Dong; Yuan Sun; Yiting Yang; Xing Zhang; Zhijun Lin; Qingsen Yan; Haokui Zhang; Peng Wang; Yang Yang; Hengtao Shen

Efficient Adaptation of Pre-trained Vision Transformer via Householder Transformation

Wei Dong, Yuan Sun, Yiting Yang, Xing Zhang, Zhijun Lin, Qingsen Yan, Haokui Zhang, Peng Wang, Yang Yang, Hengtao Shen

School of Computer Science

Research output: Contribution to journal › Conference article › peer-review

Abstract

A common strategy for Parameter-Efficient Fine-Tuning (PEFT) of pre-trained Vision Transformers (ViTs) involves adapting the model to downstream tasks by learning a low-rank adaptation matrix. This matrix is decomposed into a product of down-projection and up-projection matrices, with the bottleneck dimensionality being crucial for reducing the number of learnable parameters, as exemplified by prevalent methods like LoRA and Adapter. However, these low-rank strategies typically employ a fixed bottleneck dimensionality, which limits their flexibility in handling layer-wise variations. To address this limitation, we propose a novel PEFT approach inspired by Singular Value Decomposition (SVD) for representing the adaptation matrix. SVD decomposes a matrix into the product of a left unitary matrix, a diagonal matrix of scaling values, and a right unitary matrix. We utilize Householder transformations to construct orthogonal matrices that efficiently mimic the unitary matrices, requiring only a vector. The diagonal values are learned in a layer-wise manner, allowing them to flexibly capture the unique properties of each layer. This approach enables the generation of adaptation matrices with varying ranks across different layers, providing greater flexibility in adapting pre-trained models. Experiments on standard downstream vision tasks demonstrate that our method achieves promising fine-tuning performance.

Original language	English
Journal	Advances in Neural Information Processing Systems
Volume	37
State	Published - 2024
Event	38th Conference on Neural Information Processing Systems, NeurIPS 2024 - Vancouver, Canada Duration: 9 Dec 2024 → 15 Dec 2024

Cite this

@article{521c19adb3b1468aa1b16b3442a8d5af,

title = "Efficient Adaptation of Pre-trained Vision Transformer via Householder Transformation",

abstract = "A common strategy for Parameter-Efficient Fine-Tuning (PEFT) of pre-trained Vision Transformers (ViTs) involves adapting the model to downstream tasks by learning a low-rank adaptation matrix. This matrix is decomposed into a product of down-projection and up-projection matrices, with the bottleneck dimensionality being crucial for reducing the number of learnable parameters, as exemplified by prevalent methods like LoRA and Adapter. However, these low-rank strategies typically employ a fixed bottleneck dimensionality, which limits their flexibility in handling layer-wise variations. To address this limitation, we propose a novel PEFT approach inspired by Singular Value Decomposition (SVD) for representing the adaptation matrix. SVD decomposes a matrix into the product of a left unitary matrix, a diagonal matrix of scaling values, and a right unitary matrix. We utilize Householder transformations to construct orthogonal matrices that efficiently mimic the unitary matrices, requiring only a vector. The diagonal values are learned in a layer-wise manner, allowing them to flexibly capture the unique properties of each layer. This approach enables the generation of adaptation matrices with varying ranks across different layers, providing greater flexibility in adapting pre-trained models. Experiments on standard downstream vision tasks demonstrate that our method achieves promising fine-tuning performance.",

author = "Wei Dong and Yuan Sun and Yiting Yang and Xing Zhang and Zhijun Lin and Qingsen Yan and Haokui Zhang and Peng Wang and Yang Yang and Hengtao Shen",

note = "Publisher Copyright: {\textcopyright} 2024 Neural information processing systems foundation. All rights reserved.; 38th Conference on Neural Information Processing Systems, NeurIPS 2024 ; Conference date: 09-12-2024 Through 15-12-2024",

year = "2024",

language = "英语",

volume = "37",

journal = "Advances in Neural Information Processing Systems",

issn = "1049-5258",

publisher = "Neural information processing systems foundation",

}

TY - JOUR

T1 - Efficient Adaptation of Pre-trained Vision Transformer via Householder Transformation

AU - Dong, Wei

AU - Sun, Yuan

AU - Yang, Yiting

AU - Zhang, Xing

AU - Lin, Zhijun

AU - Yan, Qingsen

AU - Zhang, Haokui

AU - Wang, Peng

AU - Yang, Yang

AU - Shen, Hengtao

PY - 2024

Y1 - 2024

N2 - A common strategy for Parameter-Efficient Fine-Tuning (PEFT) of pre-trained Vision Transformers (ViTs) involves adapting the model to downstream tasks by learning a low-rank adaptation matrix. This matrix is decomposed into a product of down-projection and up-projection matrices, with the bottleneck dimensionality being crucial for reducing the number of learnable parameters, as exemplified by prevalent methods like LoRA and Adapter. However, these low-rank strategies typically employ a fixed bottleneck dimensionality, which limits their flexibility in handling layer-wise variations. To address this limitation, we propose a novel PEFT approach inspired by Singular Value Decomposition (SVD) for representing the adaptation matrix. SVD decomposes a matrix into the product of a left unitary matrix, a diagonal matrix of scaling values, and a right unitary matrix. We utilize Householder transformations to construct orthogonal matrices that efficiently mimic the unitary matrices, requiring only a vector. The diagonal values are learned in a layer-wise manner, allowing them to flexibly capture the unique properties of each layer. This approach enables the generation of adaptation matrices with varying ranks across different layers, providing greater flexibility in adapting pre-trained models. Experiments on standard downstream vision tasks demonstrate that our method achieves promising fine-tuning performance.

AB - A common strategy for Parameter-Efficient Fine-Tuning (PEFT) of pre-trained Vision Transformers (ViTs) involves adapting the model to downstream tasks by learning a low-rank adaptation matrix. This matrix is decomposed into a product of down-projection and up-projection matrices, with the bottleneck dimensionality being crucial for reducing the number of learnable parameters, as exemplified by prevalent methods like LoRA and Adapter. However, these low-rank strategies typically employ a fixed bottleneck dimensionality, which limits their flexibility in handling layer-wise variations. To address this limitation, we propose a novel PEFT approach inspired by Singular Value Decomposition (SVD) for representing the adaptation matrix. SVD decomposes a matrix into the product of a left unitary matrix, a diagonal matrix of scaling values, and a right unitary matrix. We utilize Householder transformations to construct orthogonal matrices that efficiently mimic the unitary matrices, requiring only a vector. The diagonal values are learned in a layer-wise manner, allowing them to flexibly capture the unique properties of each layer. This approach enables the generation of adaptation matrices with varying ranks across different layers, providing greater flexibility in adapting pre-trained models. Experiments on standard downstream vision tasks demonstrate that our method achieves promising fine-tuning performance.

UR - http://www.scopus.com/inward/record.url?scp=105000557969&partnerID=8YFLogxK

M3 - 会议文章

AN - SCOPUS:105000557969

SN - 1049-5258

VL - 37

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

T2 - 38th Conference on Neural Information Processing Systems, NeurIPS 2024

Y2 - 9 December 2024 through 15 December 2024

ER -

Efficient Adaptation of Pre-trained Vision Transformer via Householder Transformation

Abstract

Other files and links

Fingerprint

Cite this