← USPTO Patent Grants

Gated linear contextual bandits

Grant US12585912B2 Kind: B2 Mar 24, 2026

Assignee

GDM Holding LLC

Inventors

Eren Sezener, Joel William Veness, Marcus Hutter, Jianan Wang, David Budden

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer-readable storage media, for training a neural network to control a real-world agent interacting with a real-world environment to cause the real-world agent to perform a particular task. One of the methods includes training the neural network to determine first values of the parameters by optimizing a first task-specific objective that measures a performance of the policy neural network in controlling a simulated version of the real-world agent; obtaining real-world data generated from interactions of the real-world agent with the real-world environment; and training the neural network to determine trained values of the parameters from the first values of the parameters by jointly optimizing (i) a self-supervised objective that measures at least a performance of internal representations generated by the neural network on a self-supervised task performed on the real-world data and (ii) a second task-specific objective.

CPC Classifications

G06N 3/006 G06N 3/063 G06N 3/045 G06N 3/048 G06N 7/01 G06N 3/088

Filing Date

2020-10-08

Application No.

17766854

Claims

20