TRAINING-FREE ERROR COMPENSATION FOR A COMPRESSED LARGE LANGUAGE MODEL
Inventors
Min-Hung Chen, Shih-Yang Liu, Pavlo Molchanov, Maksim Khadkevich, Charbel Sakr, Chien-Yi Wang, Saurav Muralidharan, Hongxu Yin, Huck Yang, Jan Kautz, Frank Wang
Abstract
Large language models (LLMs) learn via machine learning to understand and generate human-like text, and thus are power when used for various language-based tasks, such as text summarization, translation, and content generation. However, to provide superior performance, the LLM is often of a considerable model size and requires high inference costs. To mitigate the size and execution costs of LLMs, methods have been developed to specifically compress LLMs. However, most existing methods either incur significant accuracy degradation compared to uncompressed models or have high training time, while their adaptability is often constrained by a limited range of hardware-supported compression formats. The present disclosure provides error compensation for a compressed LLM in a training free manner that provides flexibility for diverse performance needs.
CPC Classifications
Filing Date
2025-06-05
Application No.
19229953