Method for Accelerating LLM Inference Procedures
Summary
USPTO published patent application US20260094028A1 by MEDIATEK INC. disclosing a method for accelerating large language model inference through draft token generation, rule-based determination, and matching operations. The invention aims to improve computational efficiency of LLM inference procedures using a two-stage drafting and matching approach. The application was filed September 26, 2025.
What changed
MEDIATEK INC. filed a patent application disclosing a method for accelerating LLM inference procedures. The method involves generating draft tokens through a first drafting procedure, determining whether an acceleration rule is met, generating additional draft tokens if needed, inputting formal draft tokens to the LLM, and performing matching operations between formal draft tokens and generated target tokens. The invention covers CPC classifications G06N 5/04 and G06F 40/284, with Application No. 19340885.
Technology companies developing or deploying large language models should monitor this patent's prosecution. While patent applications create no immediate compliance obligations, if granted, the technique may become relevant for companies implementing LLM inference acceleration. Patent prosecution typically spans 2-3 years before a grant or rejection decision. No action is required at this stage.
Source document (simplified)
METHOD FOR PERFORMING ACCELERATION PROCEDURE TO ACCELERATE INFERENCE PROCEDURE OF LARGE LANGUAGE MODEL
Application US20260094028A1 Kind: A1 Apr 02, 2026
Assignee
MEDIATEK INC.
Inventors
Yue-Ting Pan, Huai-Ting Li, Yi-Min Tsai, Ya-Lin Huang, I-Lin Chen
Abstract
A method for performing an acceleration procedure to accelerate an inference procedure of a large language model (LLM) includes: performing a first drafting procedure to generate multiple first draft tokens; according to first draft information related to the multiple first draft tokens, determining whether a first rule is met to generate a first determination result, wherein the first rule corresponds to the first acceleration procedure; in response to the first determination result indicating that the first rule is not met, performing a second drafting procedure to generate multiple second draft tokens; obtaining multiple formal draft tokens at least based on the multiple second draft tokens; inputting the multiple formal draft tokens to the LLM in order to generate multiple target tokens; and performing a matching operation upon the multiple formal draft tokens and the multiple target tokens to generate at least one output tokens of the LLM.
CPC Classifications
G06N 5/04 G06F 40/284
Filing Date
2025-09-26
Application No.
19340885
Named provisions
Related changes
Source
Classification
Who this affects
Taxonomy
Browse Categories
Get Telecom & Technology alerts
Weekly digest. AI-summarized, no noise.
Free. Unsubscribe anytime.
Get alerts for this source
We'll email you when ChangeBridge: Patent Apps - AI & Computing (G06N) publishes new changes.