Single Trajectory Policy Optimization for Generative Machine Learning Models

ChangeBridge: Patent Apps - AI & Computing (G06N)

Published March 26th, 2026

Detected March 31st, 2026

Summary

The USPTO published patent application US20260087409A1 for 'Single Trajectory Policy Optimization for Generative Machine Learning Models' filed by 14 inventors. The application covers methods for training generative ML models by optimizing an objective function based on likelihoods and quality scores of generated data items. Patent applications are informational publications and do not impose regulatory obligations.

View original document View source feed page

What changed

The USPTO published a patent application (US20260087409A1) disclosing methods for training generative machine learning models using single trajectory policy optimization. The invention involves obtaining training examples with prompts, data items, and quality scores; determining likelihoods of generating those data items; calculating expected quality scores; and training the model to optimize an objective function based on likelihood-quality differences. The application was filed on 2025-05-22 under application number 19216677.

Patent applications do not create compliance obligations or deadlines. This is an informational publication indicating the technology has been disclosed and is under examination. Companies developing generative AI systems should monitor this application as it may indicate prior art in policy optimization techniques, but no immediate action is required. Patent prosecution typically takes 2-3 years before issuance or rejection.

Source document (simplified)

← USPTO Patent Applications

SINGLE TRAJECTORY POLICY OPTIMIZATION FOR GENERATIVE MACHINE LEARNING MODELS

Application US20260087409A1 Kind: A1 Mar 26, 2026

Inventors

Bilal Piot, Pierre Richemond, Yunhao Tang, Daniele Calandriello, Zhaohan Guo, Gil Shamir, Tianqi Liu, Rishabh Joshi, Lior Shani, Eugene Tarassov, Remi Munos, Bernardo Avila Pires, Lucas Joseph Spangher, Mohammad Gheshlaghi Azar, Rafael Mitkov Rafailov

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a generative machine learning machine learning models to perform a machine learning task. In one aspect, a method comprises at each of a sequence of training iterations for a target generative model: obtaining a plurality of training examples that each include an example prompt, an example data item, and a quality score for the example data item; determining likelihoods of the target generative machine learning model generating the example data items for the training examples; determining expected quality scores for the training examples; and training the target generative machine learning model to optimize an objective function that depends on the likelihoods of the target generative machine learning model generating the example data items for the training examples and a difference between the quality scores and the expected quality scores for the training examples.