Changeflow GovPing Telecom & Technology LLM Speculative Decoding for AI Inference Accel...
Routine Notice Added Final

LLM Speculative Decoding for AI Inference Acceleration

Favicon for changeflow.com ChangeBridge: Patent Apps - AI & Computing (G06N)
Published
Detected
Email

Summary

The USPTO published patent application US20260093960A1 by inventors Yao Cui Fehlis and Jalal Uddin Mahmud, disclosing a method for accelerating large language model inference using speculative decoding. The invention uses a neural network to output speculative decoding parameters iteratively, minimizing cumulative runtime. The patent covers CPC classifications G06N 3/047, G06F 40/284, and G06N 3/092.

What changed

The patent application discloses a method for accelerating LLM inference through speculative decoding. The method involves a first neural network selecting from multiple sets of speculative decoding parameters across iterations, with speculative decoding used to generate subsequent tokens appended to the prompt or previous iteration output. Runtime is collected during each iteration until the updated token sequence reaches a maximum length. The neural network is trained to minimize the sum of runtimes across iterations. Application No. 18901142 was filed on September 30, 2024.

This patent application represents a technical disclosure in the AI acceleration space rather than a regulatory action. Technology companies developing LLMs, AI inference systems, or related hardware should review the claims to assess potential licensing implications or design-around considerations. No compliance actions or deadlines are associated with this document as it is a published patent application rather than a regulatory requirement.

Archived snapshot

Apr 2, 2026

GovPing captured this document from the original source. If the source has since changed or been removed, this is the text as it existed at that time.

← USPTO Patent Applications

LARGE LANGUAGE MODEL INFERENCING ACCELERATION TECHNIQUES

Application US20260093960A1 Kind: A1 Apr 02, 2026

Inventors

Yao Cui Fehlis, Jalal Uddin Mahmud

Abstract

A method includes generating a plurality of tokens from a prompt to a large language model (LLM). The method includes, in one or more iterations, using a first neural network to output a set of speculative decoding parameters selected from a plurality of sets of speculative decoding parameters. Additionally, in the one or more iterations, the method includes performing speculative decoding using the set of speculative decoding parameters to generate a subsequent plurality of tokens appended to the plurality of tokens from on the prompt or from a previous iteration to generate an updated plurality of tokens and collecting a runtime of the speculative decoding. The one or more iterations are repeated until the updated plurality of tokens reaches a maximum token length. The first neural network is trained to output sets of speculative decoding parameters to minimize a sum of runtimes during the one or more iterations.

CPC Classifications

G06N 3/047 G06F 40/284 G06N 3/092

Filing Date

2024-09-30

Application No.

18901142

View original document →

Named provisions

Abstract CPC Classifications Inventors Filing Date

Get daily alerts for ChangeBridge: Patent Apps - AI & Computing (G06N)

Daily digest delivered to your inbox.

Free. Unsubscribe anytime.

About this page

What is GovPing?

Every important government, regulator, and court update from around the world. One place. Real-time. Free. Our mission

What's from the agency?

Source document text, dates, docket IDs, and authority are extracted directly from USPTO.

What's AI-generated?

The plain-English summary, classification, and "what to do next" steps are AI-generated from the original text. Cite the source document, not the AI analysis.

Last updated

Classification

Agency
USPTO
Published
April 2nd, 2026
Instrument
Notice
Legal weight
Non-binding
Stage
Final
Change scope
Minor
Document ID
US20260093960A1

Who this affects

Applies to
Technology companies Manufacturers
Industry sector
3341 Computer & Electronics Manufacturing 5112 Software & Technology 3345 Medical Device Manufacturing
Activity scope
Patent Application
Geographic scope
United States US

Taxonomy

Primary area
Artificial Intelligence
Operational domain
Legal
Topics
Intellectual Property Software & Technology

Get alerts for this source

We'll email you when ChangeBridge: Patent Apps - AI & Computing (G06N) publishes new changes.

Optional. Personalizes your daily digest.

Free. Unsubscribe anytime.