Changeflow GovPing Telecom & Technology USPTO Patent Grant for Policy Neural Network Ag...
Routine Notice Added Final

USPTO Patent Grant for Policy Neural Network Agent Control

Favicon for changeflow.com ChangeBridge: Patent Grants - AI & Computing (G06N)
Published March 24th, 2026
Detected March 25th, 2026
Email

Summary

The USPTO has granted patent US12585941B2 to GDM Holding LLC for a method of training a policy neural network to control an agent. The patent, filed on January 7, 2022, details a process involving best response policy iteration and updating the neural network based on generated training data.

What changed

The United States Patent and Trademark Office (USPTO) has issued patent US12585941B2, titled "Training a policy neural network for controlling an agent using best response policy iteration," to GDM Holding LLC. The patent, granted on March 24, 2026, with a filing date of January 7, 2022, describes a method for training a policy neural network by repeatedly updating it through iterations. This process involves generating training data using an improved policy and performing a best response computation with candidate policies and a candidate value neural network.

This patent grant is primarily an intellectual property matter and does not impose direct regulatory obligations on businesses. However, it signifies innovation in the field of AI and machine learning, specifically in agent control and policy optimization. Companies operating in AI development, particularly those utilizing neural networks for agent control, may wish to review the patent's claims to understand the scope of the granted intellectual property and ensure their own development activities do not infringe upon this patent.

Source document (simplified)

← USPTO Patent Grants

Training a policy neural network for controlling an agent using best response policy iteration

Grant US12585941B2 Kind: B2 Mar 24, 2026

Assignee

GDM Holding LLC

Inventors

Thomas William Anthony, Thomas Edward Eccles, Andrea Tacchetti, János Kramár, Ian Michael Gemp, Thomas Chalmers Hudson, Nicolas Pierre Mickaël Porcel, Marc Lanctot, Julien Perolat, Richard Everett, Thore Kurt Hartwig Graepel, Yoram Bachrach

Abstract

Methods, systems and apparatus, including computer programs encoded on computer storage media, for training a policy neural network by repeatedly updating the policy neural network at each of a plurality of training iterations. One of the methods includes generating training data for the training iteration by controlling the agent in accordance with an improved policy that selects actions in response to input state representations. A best response computation is performed using (i) a candidate policy generated from respective policy neural networks as of one or more preceding iterations and (ii) a candidate value neural network. The candidate value neural network is configured to generate a value output that is an estimate of a value of the environment being in the state characterized by a state representation to complete a particular task. The policy neural network is updated by training the policy neural network on the training data.

CPC Classifications

G06N 3/08

Filing Date

2022-01-07

Application No.

17570870

Claims

20

View original document →

Classification

Agency
USPTO
Published
March 24th, 2026
Instrument
Notice
Legal weight
Non-binding
Stage
Final
Change scope
Minor
Document ID
US12585941B2

Who this affects

Industry sector
5112 Software & Technology
Activity scope
AI Development
Geographic scope
United States US

Taxonomy

Primary area
Intellectual Property
Operational domain
IT Security
Topics
Artificial Intelligence Machine Learning

Get Telecom & Technology alerts

Weekly digest. AI-summarized, no noise.

Free. Unsubscribe anytime.

Get alerts for this source

We'll email you when ChangeBridge: Patent Grants - AI & Computing (G06N) publishes new changes.

Free. Unsubscribe anytime.