Bridging LLMs of Differing Sizes to Reduce Latency

USPTO

Changeflow GovPing Telecom & Technology Bridging LLMs of Differing Sizes to Reduce Latency

Routine Notice Added Final

Bridging LLMs of Differing Sizes to Reduce Latency

USPTO Patent Applications - AI & Computing (G06N)

Published April 9th, 2026

Detected April 18th, 2026

Email

Summary

USPTO published patent application US20260099528A1 by inventor Brett Barros for methods of reducing LLM latency by using a smaller LLM to generate immediate responses while a larger LLM produces refined content starting from the smaller model's output. The larger model generates a refined portion succeeding the initial content, which can then be rendered to the user. Alternative implementations use default text strings or predefined templates selected via natural language understanding of the user query.

Published by USPTO on changeflow.com . Detected, standardized, and enriched by GovPing. Review our methodology and editorial standards .

View original document View source feed page

What changed

USPTO published patent application US20260099528A1 for LLM latency reduction technology. The application discloses methods where a smaller LLM generates initial content responsive to user queries, allowing immediate rendering of a portion as a response. A larger LLM then generates refined content beginning with that portion and including additional refined content. Alternative embodiments describe using default text strings or templates selected via natural language understanding instead of a smaller LLM.

Technology companies developing LLM-based applications or chatbots may benefit from reviewing this patent filing to understand potential claims around latency reduction techniques. The application has no immediate compliance implications as it represents a patent application rather than a granted patent.

Archived snapshot

Apr 18, 2026

GovPing captured this document from the original source. If the source has since changed or been removed, this is the text as it existed at that time.

← USPTO Patent Applications

LLM LATENCY REDUCTION VIA BRIDGING MULTIPLE LLMS OF DIFFERING SIZES

Application US20260099528A1 Kind: A1 Apr 09, 2026

Inventors

Brett Barros

Abstract

Implementations utilize a smaller LLM to generate content responsive to a user query and cause a portion of the generated content to be rendered as an immediate response to the user query. Implementations further utilize a larger LLM to generate content that starts with the portion of the generated content and that includes a refined portion succeeding the portion of the generated content. The refined portion can be rendered succeeding the portion of the generated content. In some implementations, instead of using the smaller LLM, alternatively, the portion of the generated content rendered as the immediate response can be generated based on a default text string or a template, where the template can be determined/selected from a plurality of predefined templates based on a natural language understanding of the user query.

CPC Classifications

G06F 16/3344 G06F 16/338 G06F 40/289 G06F 40/35 G06N 3/0475

Filing Date

2025-12-11

Application No.

19416474

View original document →

Related changes

Topological Sparse Training Process for Machine Learning Models

Routine Apr 17, 2026 • USPTO Patent Applications - AI & Computing (G06N) • Telecom & Technology

Pyramid Key-Value Cache Compression for Transformer Models

Routine Apr 17, 2026 • USPTO Patent Applications - AI & Computing (G06N) • Telecom & Technology

Universal Machine Learning Pipeline Execution System and Method

Routine Apr 15, 2026 • USPTO Patent Applications - AI & Computing (G06N) • Telecom & Technology

Get daily alerts for USPTO Patent Applications - AI & Computing (G06N)

Daily digest delivered to your inbox.

Free. Unsubscribe anytime.

Source

USPTO Patent Applications - AI & Computing (G06N) changeflow.com/changebridge/uspto-patent-applications/G06N

Telecom & Technology

About this page

What is GovPing?

Every important government, regulator, and court update from around the world. One place. Real-time. Free. Our mission

What's from the agency?

Source document text, dates, docket IDs, and authority are extracted directly from USPTO.

What's AI-generated?

The summary, classification, recommended actions, deadlines, and penalty information are AI-generated from the original text and may contain errors. Always verify against the source document.

Last updated

April 18, 2026

Press inquiries →

Classification

Agency

USPTO

Published

April 9th, 2026

Instrument

Notice

Legal weight

Non-binding

Stage

Final

Change scope

Minor

Document ID

US20260099528A1

Docket

19416474

Who this affects

Applies to

Technology companies Manufacturers

Industry sector

5112 Software & Technology

Activity scope

Patent application LLM technology

Geographic scope

United States US

Taxonomy

Primary area

Intellectual Property

Operational domain

Legal

Topics

Artificial Intelligence Software & Technology

Browse Categories

Agriculture & Food Safety 65 AI Regulation 3 Banking & Finance 332 Consumer Protection 63 Courts & Legal 361 Data Privacy & Cybersecurity 77 Defense & National Security 51 Education 44 Energy 100 Environment 86 Environmental & Energy 36 Environmental Regulation 7 Financial Regulation 2 Government & Legislation 278 Government Operations 107 Healthcare 136 Healthcare Compliance 6 Healthcare & Life Sciences 72 Housing 16 Immigration 8 Immigration & Border Control 2 Insurance 66 Labor & Employment 126 Legal & Judicial 29 Pharma & Drug Safety 101 Pharma & Healthcare 1 Privacy 1 Public Health 2 Real Estate & Housing 61 Sanctions & Export Controls 1 Securities & Investments 28 Securities & Markets 103 Securities Regulation 6 Tax 64 Tax & Revenue 9 Telecom & Technology 47 Trade & Commerce 3 Trade & Sanctions 135 Transportation 85

Bridging LLMs of Differing Sizes to Reduce Latency

Summary

What changed

Archived snapshot

LLM LATENCY REDUCTION VIA BRIDGING MULTIPLE LLMS OF DIFFERING SIZES

Inventors

Abstract

CPC Classifications

Filing Date

Application No.

Related changes

Source

About this page

Classification

Who this affects

Taxonomy

Browse Categories

Get alerts for this source

Subscribed!