METHOD FOR PERFORMING ACCELERATION PROCEDURE TO ACCELERATE INFERENCE PROCEDURE OF LARGE LANGUAGE MODEL

Application US20260094028A1 Kind: A1 Apr 02, 2026

Assignee

MEDIATEK INC.

Inventors

Yue-Ting Pan, Huai-Ting Li, Yi-Min Tsai, Ya-Lin Huang, I-Lin Chen

Abstract

A method for performing an acceleration procedure to accelerate an inference procedure of a large language model (LLM) includes: performing a first drafting procedure to generate multiple first draft tokens; according to first draft information related to the multiple first draft tokens, determining whether a first rule is met to generate a first determination result, wherein the first rule corresponds to the first acceleration procedure; in response to the first determination result indicating that the first rule is not met, performing a second drafting procedure to generate multiple second draft tokens; obtaining multiple formal draft tokens at least based on the multiple second draft tokens; inputting the multiple formal draft tokens to the LLM in order to generate multiple target tokens; and performing a matching operation upon the multiple formal draft tokens and the multiple target tokens to generate at least one output tokens of the LLM.

CPC Classifications

G06N 5/04 G06F 40/284

Filing Date

2025-09-26

Application No.

19340885

View original document →