Skip to main navigation Skip to search Skip to main content

Fine-tuning Quantized Neural Networks with Zeroth-order Optimization

  • Sifeng Shang
  • , Jiayi Zhou
  • , Chenyu Lin
  • , Minxian Li
  • , Kaiyang Zhou*
  • *Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

Abstract

As the size of large language models grows exponentially, GPU memory has become a bottleneck for adapting these models to downstream tasks. In this paper, we aim to push the limits of memory-efficient training by minimizing memory usage on model weights, gradients, and optimizer states, within a unified framework. Our idea is to eliminate both gradients and optimizer states using zeroth-order optimization, which approximates gradients by perturbing weights during forward passes to identify gradient directions. To minimize memory usage on weights, we employ model quantization, e.g., converting from bfloat16 to int4. However, directly applying zeroth-order optimization to quantized weights is infeasible due to the precision gap between discrete weights and continuous gradients, which would otherwise require de-quantization and re-quantization. To overcome this challenge, we propose Quantized Zeroth-order Optimization (QZO), a simple yet effective approach that perturbs the continuous quantization scale for gradient estimation and uses a directional derivative clipping method to stabilize training. QZO is orthogonal to both scalar-based and codebook-based post-training quantization methods. Compared to full-parameter fine-tuning in 16 bits, QZO can reduce the total memory cost by more than 18$\times$ for 4-bit LLMs, and enables fine-tuning Llama-2-13B within a single 24GB GPU.
Original languageEnglish
Title of host publicationThe Fourteenth International Conference on Learning Representations, ICLR 2026
PublisherInternational Conference on Learning Representations, ICLR
Number of pages18
Publication statusPublished - 23 Apr 2026
Event14th International Conference on Learning Representations, ICLR 2026 - Rio de Janeiro, Brazil
Duration: 23 Apr 202627 Apr 2026
https://iclr.cc/Conferences/2026 (Conference website)
https://openreview.net/group?id=ICLR.cc/2026 (Conference proceedings)
https://iclr.cc/virtual/2026/calendar (Conference schedule)

Publication series

NameInternational Conference on Learning Representations

Conference

Conference14th International Conference on Learning Representations, ICLR 2026
Abbreviated titleICLR 2026
Country/TerritoryBrazil
CityRio de Janeiro
Period23/04/2627/04/26
Internet address

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 9 - Industry, Innovation, and Infrastructure
    SDG 9 Industry, Innovation, and Infrastructure

User-Defined Keywords

  • cs.LG
  • cs.CL
  • cs.CV

Fingerprint

Dive into the research topics of 'Fine-tuning Quantized Neural Networks with Zeroth-order Optimization'. Together they form a unique fingerprint.

Cite this