Memory-Efficient Draft Machine Learning Model
Summary
USPTO published patent application US20260099673A1 titled 'Memory-Efficient Draft Machine Learning Model' on April 9, 2026. The application covers systems and methods for model training using linear layers, decoding layers, down-projection layers, and token predictors to generate ML output tokens with reduced memory requirements. The invention addresses computational efficiency in neural network architectures by processing embeddings through dimension-reducing projection operations.
What changed
USPTO published patent application US20260099673A1 disclosing a memory-efficient draft machine learning model architecture. The system processes embeddings through a linear layer, decodes features, applies down-projection to reduce dimensionality, and generates output tokens via a token predictor. The down-projection layer specifically reduces feature dimensions from a first set to a smaller second set to optimize memory usage during model training and inference.
For technology companies and AI developers, this patent represents potential prior art to consider in ML infrastructure development. Organizations building draft model systems, speculative decoding pipelines, or memory-optimized neural network training systems should review the claims for freedom-to-operate considerations. The patent's focus on dimension reduction through down-projection layers may be particularly relevant to teams developing efficient inference systems or deploying large language models in resource-constrained environments.
Archived snapshot
Apr 18, 2026GovPing captured this document from the original source. If the source has since changed or been removed, this is the text as it existed at that time.
MEMORY-EFFICIENT DRAFT MACHINE LEARNING MODEL
Application US20260099673A1 Kind: A1 Apr 09, 2026
Inventors
Mingu LEE, Wonseok JEON, Junyoung PARK, Kanghoon YOON, Christopher LOTT
Abstract
Disclosed are systems, apparatuses, processes, and computer-readable media for model training. A device may process, using a linear layer, an embedding generated from a first output token and input features to generate first features, wherein the first output token is generated by a previous iteration of a token predictor and wherein the input features are generated by a previous iteration of a decoding layer. A device may process, using the decoding layer, the first features to generate second features having first dimensions. A device may process, using a down-projection layer, the second features to generate third features having second dimensions smaller than the first dimensions. A device may generate, using the token predictor and the third features, a second output token.
CPC Classifications
G06F 40/284 G06F 40/40 G06N 3/0455
Filing Date
2025-02-11
Application No.
19051081
Related changes
Get daily alerts for USPTO Patent Applications - AI & Computing (G06N)
Daily digest delivered to your inbox.
Free. Unsubscribe anytime.
Source
About this page
Every important government, regulator, and court update from around the world. One place. Real-time. Free. Our mission
Source document text, dates, docket IDs, and authority are extracted directly from USPTO.
The summary, classification, recommended actions, deadlines, and penalty information are AI-generated from the original text and may contain errors. Always verify against the source document.
Classification
Who this affects
Taxonomy
Browse Categories
Get alerts for this source
We'll email you when USPTO Patent Applications - AI & Computing (G06N) publishes new changes.
Subscribed!
Optional. Filters your digest to exactly the updates that matter to you.