Quantized Memory-Efficient Full-Parameter Tuning with Sign Descent Optimization

  • Xuezhi Zhao
  • , Haichen Bai
  • , Qiang Li
  • , Qi Wang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Full Parameter Fine-Tuning (FPFT) has become the preferred method for adapting LLMs to downstream tasks due to its exceptional performance. Current methods primarily utilize zeroth-order optimizers or integrate gradient computation and updates to conserve GPU memory. However, they fail to consider the optimizer states information (e.g., momentum, variance), leading to suboptimal convergence and instability during training. To address this, we propose a Quantized Memory-Efficient Full-Parameter Tuning with Sign descent optimization training framework (SQ-MEFT). Firstly, we construct a novel optimizer that uses the sign of momentum as the update amount to maximize the potential of momentum. In addition, to better maintain memory efficiency, we apply 4-bit quantization to the momentum while synchronously computing and updating gradients. When trained with mixed precision, our optimizer can reduce the total memory footprint by up to 7× compared to AdamW.

Original languageEnglish
Title of host publication2025 IEEE International Conference on Multimedia and Expo
Subtitle of host publicationJourney to the Center of Machine Imagination, ICME 2025 - Conference Proceedings
PublisherIEEE Computer Society
ISBN (Electronic)9798331594954
DOIs
StatePublished - 2025
Event2025 IEEE International Conference on Multimedia and Expo, ICME 2025 - Nantes, France
Duration: 30 Jun 20254 Jul 2025

Publication series

NameProceedings - IEEE International Conference on Multimedia and Expo
ISSN (Print)1945-7871
ISSN (Electronic)1945-788X

Conference

Conference2025 IEEE International Conference on Multimedia and Expo, ICME 2025
Country/TerritoryFrance
CityNantes
Period30/06/254/07/25

Keywords

  • Full Parameter Fine-Tuning
  • Memory Efficient
  • Quantization
  • Sign Descent

Fingerprint

Dive into the research topics of 'Quantized Memory-Efficient Full-Parameter Tuning with Sign Descent Optimization'. Together they form a unique fingerprint.

Cite this