Authors
Yifei Liu, Jicheng Wen, Yang Wang, Shengyu Ye, Li Lyna Zhang, Ting Cao, Cheng Li, Mao Yang
Abstract
VPTQ uses Second-Order Optimization and vector quantization to achieve extreme low-bit (1–4 bit) compression of LLMs, enabling near-lossless quantization and fast inference with significantly reduced memory footprint.