TY - GEN
T1 - Quantized Memory-Efficient Full-Parameter Tuning with Sign Descent Optimization
AU - Zhao, Xuezhi
AU - Bai, Haichen
AU - Li, Qiang
AU - Wang, Qi
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Full Parameter Fine-Tuning (FPFT) has become the preferred method for adapting LLMs to downstream tasks due to its exceptional performance. Current methods primarily utilize zeroth-order optimizers or integrate gradient computation and updates to conserve GPU memory. However, they fail to consider the optimizer states information (e.g., momentum, variance), leading to suboptimal convergence and instability during training. To address this, we propose a Quantized Memory-Efficient Full-Parameter Tuning with Sign descent optimization training framework (SQ-MEFT). Firstly, we construct a novel optimizer that uses the sign of momentum as the update amount to maximize the potential of momentum. In addition, to better maintain memory efficiency, we apply 4-bit quantization to the momentum while synchronously computing and updating gradients. When trained with mixed precision, our optimizer can reduce the total memory footprint by up to 7× compared to AdamW.
AB - Full Parameter Fine-Tuning (FPFT) has become the preferred method for adapting LLMs to downstream tasks due to its exceptional performance. Current methods primarily utilize zeroth-order optimizers or integrate gradient computation and updates to conserve GPU memory. However, they fail to consider the optimizer states information (e.g., momentum, variance), leading to suboptimal convergence and instability during training. To address this, we propose a Quantized Memory-Efficient Full-Parameter Tuning with Sign descent optimization training framework (SQ-MEFT). Firstly, we construct a novel optimizer that uses the sign of momentum as the update amount to maximize the potential of momentum. In addition, to better maintain memory efficiency, we apply 4-bit quantization to the momentum while synchronously computing and updating gradients. When trained with mixed precision, our optimizer can reduce the total memory footprint by up to 7× compared to AdamW.
KW - Full Parameter Fine-Tuning
KW - Memory Efficient
KW - Quantization
KW - Sign Descent
UR - https://www.scopus.com/pages/publications/105022654965
U2 - 10.1109/ICME59968.2025.11209822
DO - 10.1109/ICME59968.2025.11209822
M3 - 会议稿件
AN - SCOPUS:105022654965
T3 - Proceedings - IEEE International Conference on Multimedia and Expo
BT - 2025 IEEE International Conference on Multimedia and Expo
PB - IEEE Computer Society
T2 - 2025 IEEE International Conference on Multimedia and Expo, ICME 2025
Y2 - 30 June 2025 through 4 July 2025
ER -