USPTO Patent Grant for Gated Linear Contextual Bandits

ChangeBridge: Patent Grants - AI & Computing (G06N)

Published March 24th, 2026

Detected March 25th, 2026

Summary

The USPTO has granted patent US12585912B2 for 'Gated linear contextual bandits' to GDM Holding LLC. The patent covers methods and systems for training neural networks to control agents interacting with real-world environments, optimizing performance through a combination of task-specific and self-supervised objectives.

View original document View source feed page

What changed

The United States Patent and Trademark Office (USPTO) has issued patent grant US12585912B2 for 'Gated linear contextual bandits'. This patent, assigned to GDM Holding LLC, details methods and systems for training neural networks to control real-world agents. The training process involves optimizing both task-specific objectives using simulated environments and a combination of self-supervised and task-specific objectives using real-world data.

This patent grant is primarily an intellectual property event and does not impose direct compliance obligations on regulated entities. However, companies operating in the AI and machine learning space, particularly those developing advanced control systems or utilizing similar neural network training methodologies, should be aware of this granted patent. It may impact future innovation and licensing strategies within the field.

Source document (simplified)

← USPTO Patent Grants

Gated linear contextual bandits

Grant US12585912B2 Kind: B2 Mar 24, 2026

Assignee

GDM Holding LLC

Inventors

Eren Sezener, Joel William Veness, Marcus Hutter, Jianan Wang, David Budden

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer-readable storage media, for training a neural network to control a real-world agent interacting with a real-world environment to cause the real-world agent to perform a particular task. One of the methods includes training the neural network to determine first values of the parameters by optimizing a first task-specific objective that measures a performance of the policy neural network in controlling a simulated version of the real-world agent; obtaining real-world data generated from interactions of the real-world agent with the real-world environment; and training the neural network to determine trained values of the parameters from the first values of the parameters by jointly optimizing (i) a self-supervised objective that measures at least a performance of internal representations generated by the neural network on a self-supervised task performed on the real-world data and (ii) a second task-specific objective.