Method for Accelerating LLM Inference Procedures

ChangeBridge: Patent Apps - AI & Computing (G06N)

Published September 26th, 2025

Detected April 3rd, 2026

Summary

USPTO published patent application US20260094028A1 by MEDIATEK INC. disclosing a method for accelerating large language model inference through draft token generation, rule-based determination, and matching operations. The invention aims to improve computational efficiency of LLM inference procedures using a two-stage drafting and matching approach. The application was filed September 26, 2025.

View original document View source feed page

What changed

MEDIATEK INC. filed a patent application disclosing a method for accelerating LLM inference procedures. The method involves generating draft tokens through a first drafting procedure, determining whether an acceleration rule is met, generating additional draft tokens if needed, inputting formal draft tokens to the LLM, and performing matching operations between formal draft tokens and generated target tokens. The invention covers CPC classifications G06N 5/04 and G06F 40/284, with Application No. 19340885.

Technology companies developing or deploying large language models should monitor this patent's prosecution. While patent applications create no immediate compliance obligations, if granted, the technique may become relevant for companies implementing LLM inference acceleration. Patent prosecution typically spans 2-3 years before a grant or rejection decision. No action is required at this stage.

Source document (simplified)

← USPTO Patent Applications

METHOD FOR PERFORMING ACCELERATION PROCEDURE TO ACCELERATE INFERENCE PROCEDURE OF LARGE LANGUAGE MODEL

Application US20260094028A1 Kind: A1 Apr 02, 2026

Assignee

MEDIATEK INC.

Inventors

Yue-Ting Pan, Huai-Ting Li, Yi-Min Tsai, Ya-Lin Huang, I-Lin Chen

Abstract

A method for performing an acceleration procedure to accelerate an inference procedure of a large language model (LLM) includes: performing a first drafting procedure to generate multiple first draft tokens; according to first draft information related to the multiple first draft tokens, determining whether a first rule is met to generate a first determination result, wherein the first rule corresponds to the first acceleration procedure; in response to the first determination result indicating that the first rule is not met, performing a second drafting procedure to generate multiple second draft tokens; obtaining multiple formal draft tokens at least based on the multiple second draft tokens; inputting the multiple formal draft tokens to the LLM in order to generate multiple target tokens; and performing a matching operation upon the multiple formal draft tokens and the multiple target tokens to generate at least one output tokens of the LLM.