Sample-efficient reinforcement learning

Grant US12579438B2 Kind: B2 Mar 17, 2026

Assignee

Google LLC

Inventors

Danijar Hafner, Jacob Buckman, Honglak Lee, Eugene Brevdo, George Jay Tucker

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for sample-efficient reinforcement learning. One of the methods includes maintaining an ensemble of Q networks, an ensemble of transition models, and an ensemble of reward models; obtaining a transition; generating, using the ensemble of transition models, M trajectories; for each time step in each of the trajectories: generating, using the ensemble of reward models, N rewards for the time step, generating, using the ensemble of Q networks, L Q values for the time step, and determining, from the rewards, the Q values, and the training reward, L*N candidate target Q values for the trajectory and for the time step; for each of the time steps, combining the candidate target Q values; determining a final target Q value; and training at least one of the Q networks in the ensemble using the final target Q value.

CPC Classifications

G06N 3/092 G06N 3/08 G06N 3/084 G06N 3/045 G06N 3/047 G06N 3/082 G06N 3/006

Filing Date

2019-05-20

Application No.

17056640

Claims

View original document →