METHOD FOR PERFORMING ACCELERATION PROCEDURE TO ACCELERATE INFERENCE PROCEDURE OF LARGE LANGUAGE MODEL
Assignee
MEDIATEK INC.
Inventors
Yue-Ting Pan, Huai-Ting Li, Yi-Min Tsai, Ya-Lin Huang, I-Lin Chen
Abstract
A method for performing an acceleration procedure to accelerate an inference procedure of a large language model (LLM) includes: performing a first drafting procedure to generate multiple first draft tokens; according to first draft information related to the multiple first draft tokens, determining whether a first rule is met to generate a first determination result, wherein the first rule corresponds to the first acceleration procedure; in response to the first determination result indicating that the first rule is not met, performing a second drafting procedure to generate multiple second draft tokens; obtaining multiple formal draft tokens at least based on the multiple second draft tokens; inputting the multiple formal draft tokens to the LLM in order to generate multiple target tokens; and performing a matching operation upon the multiple formal draft tokens and the multiple target tokens to generate at least one output tokens of the LLM.
CPC Classifications
Filing Date
2025-09-26
Application No.
19340885