USPTO Patent Grant for Policy Neural Network Agent Control
Summary
The USPTO has granted patent US12585941B2 to GDM Holding LLC for a method of training a policy neural network to control an agent. The patent, filed on January 7, 2022, details a process involving best response policy iteration and updating the neural network based on generated training data.
What changed
The United States Patent and Trademark Office (USPTO) has issued patent US12585941B2, titled "Training a policy neural network for controlling an agent using best response policy iteration," to GDM Holding LLC. The patent, granted on March 24, 2026, with a filing date of January 7, 2022, describes a method for training a policy neural network by repeatedly updating it through iterations. This process involves generating training data using an improved policy and performing a best response computation with candidate policies and a candidate value neural network.
This patent grant is primarily an intellectual property matter and does not impose direct regulatory obligations on businesses. However, it signifies innovation in the field of AI and machine learning, specifically in agent control and policy optimization. Companies operating in AI development, particularly those utilizing neural networks for agent control, may wish to review the patent's claims to understand the scope of the granted intellectual property and ensure their own development activities do not infringe upon this patent.
Source document (simplified)
Training a policy neural network for controlling an agent using best response policy iteration
Grant US12585941B2 Kind: B2 Mar 24, 2026
Assignee
GDM Holding LLC
Inventors
Thomas William Anthony, Thomas Edward Eccles, Andrea Tacchetti, János Kramár, Ian Michael Gemp, Thomas Chalmers Hudson, Nicolas Pierre Mickaël Porcel, Marc Lanctot, Julien Perolat, Richard Everett, Thore Kurt Hartwig Graepel, Yoram Bachrach
Abstract
Methods, systems and apparatus, including computer programs encoded on computer storage media, for training a policy neural network by repeatedly updating the policy neural network at each of a plurality of training iterations. One of the methods includes generating training data for the training iteration by controlling the agent in accordance with an improved policy that selects actions in response to input state representations. A best response computation is performed using (i) a candidate policy generated from respective policy neural networks as of one or more preceding iterations and (ii) a candidate value neural network. The candidate value neural network is configured to generate a value output that is an estimate of a value of the environment being in the state characterized by a state representation to complete a particular task. The policy neural network is updated by training the policy neural network on the training data.
CPC Classifications
G06N 3/08
Filing Date
2022-01-07
Application No.
17570870
Claims
20
Related changes
Source
Classification
Who this affects
Taxonomy
Browse Categories
Get Telecom & Technology alerts
Weekly digest. AI-summarized, no noise.
Free. Unsubscribe anytime.
Get alerts for this source
We'll email you when ChangeBridge: Patent Grants - AI & Computing (G06N) publishes new changes.