Single Trajectory Policy Optimization for Generative Machine Learning Models
Summary
The USPTO published patent application US20260087409A1 for 'Single Trajectory Policy Optimization for Generative Machine Learning Models' filed by 14 inventors. The application covers methods for training generative ML models by optimizing an objective function based on likelihoods and quality scores of generated data items. Patent applications are informational publications and do not impose regulatory obligations.
What changed
The USPTO published a patent application (US20260087409A1) disclosing methods for training generative machine learning models using single trajectory policy optimization. The invention involves obtaining training examples with prompts, data items, and quality scores; determining likelihoods of generating those data items; calculating expected quality scores; and training the model to optimize an objective function based on likelihood-quality differences. The application was filed on 2025-05-22 under application number 19216677.
Patent applications do not create compliance obligations or deadlines. This is an informational publication indicating the technology has been disclosed and is under examination. Companies developing generative AI systems should monitor this application as it may indicate prior art in policy optimization techniques, but no immediate action is required. Patent prosecution typically takes 2-3 years before issuance or rejection.
Source document (simplified)
SINGLE TRAJECTORY POLICY OPTIMIZATION FOR GENERATIVE MACHINE LEARNING MODELS
Application US20260087409A1 Kind: A1 Mar 26, 2026
Inventors
Bilal Piot, Pierre Richemond, Yunhao Tang, Daniele Calandriello, Zhaohan Guo, Gil Shamir, Tianqi Liu, Rishabh Joshi, Lior Shani, Eugene Tarassov, Remi Munos, Bernardo Avila Pires, Lucas Joseph Spangher, Mohammad Gheshlaghi Azar, Rafael Mitkov Rafailov
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a generative machine learning machine learning models to perform a machine learning task. In one aspect, a method comprises at each of a sequence of training iterations for a target generative model: obtaining a plurality of training examples that each include an example prompt, an example data item, and a quality score for the example data item; determining likelihoods of the target generative machine learning model generating the example data items for the training examples; determining expected quality scores for the training examples; and training the target generative machine learning model to optimize an objective function that depends on the likelihoods of the target generative machine learning model generating the example data items for the training examples and a difference between the quality scores and the expected quality scores for the training examples.
CPC Classifications
G06N 20/00
Filing Date
2025-05-22
Application No.
19216677
Named provisions
Related changes
Source
Classification
Who this affects
Taxonomy
Browse Categories
Get Telecom & Technology alerts
Weekly digest. AI-summarized, no noise.
Free. Unsubscribe anytime.
Get alerts for this source
We'll email you when ChangeBridge: Patent Apps - AI & Computing (G06N) publishes new changes.