Changeflow GovPing Telecom & Technology Single Trajectory Policy Optimization for Gener...
Routine Notice Added Draft

Single Trajectory Policy Optimization for Generative Machine Learning Models

Favicon for changeflow.com ChangeBridge: Patent Apps - AI & Computing (G06N)
Published March 26th, 2026
Detected March 31st, 2026
Email

Summary

The USPTO published patent application US20260087409A1 for 'Single Trajectory Policy Optimization for Generative Machine Learning Models' filed by 14 inventors. The application covers methods for training generative ML models by optimizing an objective function based on likelihoods and quality scores of generated data items. Patent applications are informational publications and do not impose regulatory obligations.

What changed

The USPTO published a patent application (US20260087409A1) disclosing methods for training generative machine learning models using single trajectory policy optimization. The invention involves obtaining training examples with prompts, data items, and quality scores; determining likelihoods of generating those data items; calculating expected quality scores; and training the model to optimize an objective function based on likelihood-quality differences. The application was filed on 2025-05-22 under application number 19216677.

Patent applications do not create compliance obligations or deadlines. This is an informational publication indicating the technology has been disclosed and is under examination. Companies developing generative AI systems should monitor this application as it may indicate prior art in policy optimization techniques, but no immediate action is required. Patent prosecution typically takes 2-3 years before issuance or rejection.

Source document (simplified)

← USPTO Patent Applications

SINGLE TRAJECTORY POLICY OPTIMIZATION FOR GENERATIVE MACHINE LEARNING MODELS

Application US20260087409A1 Kind: A1 Mar 26, 2026

Inventors

Bilal Piot, Pierre Richemond, Yunhao Tang, Daniele Calandriello, Zhaohan Guo, Gil Shamir, Tianqi Liu, Rishabh Joshi, Lior Shani, Eugene Tarassov, Remi Munos, Bernardo Avila Pires, Lucas Joseph Spangher, Mohammad Gheshlaghi Azar, Rafael Mitkov Rafailov

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a generative machine learning machine learning models to perform a machine learning task. In one aspect, a method comprises at each of a sequence of training iterations for a target generative model: obtaining a plurality of training examples that each include an example prompt, an example data item, and a quality score for the example data item; determining likelihoods of the target generative machine learning model generating the example data items for the training examples; determining expected quality scores for the training examples; and training the target generative machine learning model to optimize an objective function that depends on the likelihoods of the target generative machine learning model generating the example data items for the training examples and a difference between the quality scores and the expected quality scores for the training examples.

CPC Classifications

G06N 20/00

Filing Date

2025-05-22

Application No.

19216677

View original document →

Named provisions

Abstract Inventors CPC Classifications Filing Date

Classification

Agency
USPTO
Published
March 26th, 2026
Instrument
Notice
Legal weight
Non-binding
Stage
Draft
Change scope
Minor
Document ID
US20260087409A1 / Application No. 19216677

Who this affects

Applies to
Technology companies
Industry sector
5112 Software & Technology
Activity scope
Patent Filing Machine Learning Model Optimization
Geographic scope
United States US

Taxonomy

Primary area
Intellectual Property
Operational domain
Intellectual Property
Topics
Artificial Intelligence Technology

Get Telecom & Technology alerts

Weekly digest. AI-summarized, no noise.

Free. Unsubscribe anytime.

Get alerts for this source

We'll email you when ChangeBridge: Patent Apps - AI & Computing (G06N) publishes new changes.

Optional. Personalizes your daily digest.

Free. Unsubscribe anytime.