LLM Speculative Decoding for AI Inference Acceleration

USPTO

Changeflow GovPing Telecom & Technology LLM Speculative Decoding for AI Inference Accel...

Routine Notice Added Final

LLM Speculative Decoding for AI Inference Acceleration

ChangeBridge: Patent Apps - AI & Computing (G06N)

Published April 2nd, 2026

Detected April 2nd, 2026

Email

Summary

The USPTO published patent application US20260093960A1 by inventors Yao Cui Fehlis and Jalal Uddin Mahmud, disclosing a method for accelerating large language model inference using speculative decoding. The invention uses a neural network to output speculative decoding parameters iteratively, minimizing cumulative runtime. The patent covers CPC classifications G06N 3/047, G06F 40/284, and G06N 3/092.

View original document View source feed page

What changed

The patent application discloses a method for accelerating LLM inference through speculative decoding. The method involves a first neural network selecting from multiple sets of speculative decoding parameters across iterations, with speculative decoding used to generate subsequent tokens appended to the prompt or previous iteration output. Runtime is collected during each iteration until the updated token sequence reaches a maximum length. The neural network is trained to minimize the sum of runtimes across iterations. Application No. 18901142 was filed on September 30, 2024.

This patent application represents a technical disclosure in the AI acceleration space rather than a regulatory action. Technology companies developing LLMs, AI inference systems, or related hardware should review the claims to assess potential licensing implications or design-around considerations. No compliance actions or deadlines are associated with this document as it is a published patent application rather than a regulatory requirement.

Archived snapshot

Apr 2, 2026

GovPing captured this document from the original source. If the source has since changed or been removed, this is the text as it existed at that time.

← USPTO Patent Applications

LARGE LANGUAGE MODEL INFERENCING ACCELERATION TECHNIQUES

Application US20260093960A1 Kind: A1 Apr 02, 2026

Inventors

Yao Cui Fehlis, Jalal Uddin Mahmud

Abstract

A method includes generating a plurality of tokens from a prompt to a large language model (LLM). The method includes, in one or more iterations, using a first neural network to output a set of speculative decoding parameters selected from a plurality of sets of speculative decoding parameters. Additionally, in the one or more iterations, the method includes performing speculative decoding using the set of speculative decoding parameters to generate a subsequent plurality of tokens appended to the plurality of tokens from on the prompt or from a previous iteration to generate an updated plurality of tokens and collecting a runtime of the speculative decoding. The one or more iterations are repeated until the updated plurality of tokens reaches a maximum token length. The first neural network is trained to output sets of speculative decoding parameters to minimize a sum of runtimes during the one or more iterations.

CPC Classifications

G06N 3/047 G06F 40/284 G06N 3/092

Filing Date

2024-09-30

Application No.

18901142

View original document →

Named provisions

Abstract CPC Classifications Inventors Filing Date

Related changes

Multiscale contiguous block pixel entangler for image recognition on hybrid quantum-classical computing system

Routine Apr 07, 2026 • ChangeBridge: Patent Grants - AI & Computing (G06N) • Telecom & Technology

Document workflows in a document management system

Routine Apr 07, 2026 • ChangeBridge: Patent Grants - Business Methods (G06Q) • Banking & Finance

Realistic counterfactual explanation of machine learning predictions

Routine Apr 07, 2026 • ChangeBridge: Patent Grants - AI & Computing (G06N) • Telecom & Technology

Get daily alerts for ChangeBridge: Patent Apps - AI & Computing (G06N)

Daily digest delivered to your inbox.

Free. Unsubscribe anytime.

Source

ChangeBridge: Patent Apps - AI & Computing (G06N) changeflow.com/changebridge/uspto-patent-applications/G06N

Telecom & Technology

About this page

What is GovPing?

Every important government, regulator, and court update from around the world. One place. Real-time. Free. Our mission

What's from the agency?

Source document text, dates, docket IDs, and authority are extracted directly from USPTO.

What's AI-generated?

The plain-English summary, classification, and "what to do next" steps are AI-generated from the original text. Cite the source document, not the AI analysis.

Last updated

April 2, 2026

Press inquiries →

Classification

Agency

USPTO

Published

April 2nd, 2026

Instrument

Notice

Legal weight

Non-binding

Stage

Final

Change scope

Minor

Document ID

US20260093960A1

Who this affects

Applies to

Technology companies Manufacturers

Industry sector

3341 Computer & Electronics Manufacturing 5112 Software & Technology 3345 Medical Device Manufacturing

Activity scope

Patent Application

Geographic scope

United States US

Taxonomy

Primary area

Artificial Intelligence

Operational domain

Legal

Topics

Intellectual Property Software & Technology

LLM Speculative Decoding for AI Inference Acceleration

Summary

What changed

Archived snapshot

LARGE LANGUAGE MODEL INFERENCING ACCELERATION TECHNIQUES

Inventors

Abstract

CPC Classifications

Filing Date

Application No.

Named provisions

Related changes

Source

About this page

Classification

Who this affects

Taxonomy

Browse Categories

Get alerts for this source