Off-line learning for robot control using a reward prediction model

Grant US12576515B2 Kind: B2 Mar 17, 2026

Assignee

GDM Holding LLC

Inventors

Konrad Zolna, Scott Ellison Reed

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for off-line learning using a reward prediction model. One of the methods includes obtaining robot experience data; training, on a first subset of the robot experience data, a reward prediction model that receives a reward input comprising an input observation and generates as output a reward prediction that is a prediction of a task-specific reward for the particular task that should be assigned to the input observation; processing experiences in the robot experience data using the trained reward prediction model to generate a respective reward prediction for each of the processed experiences; and training a policy neural network on (i) the processed experiences and (ii) the respective reward predictions for the processed experiences.

CPC Classifications

B25J 9/161 B25J 9/163 G06N 3/045 G06N 3/006 G06N 3/0464 G06N 3/0895 G06N 3/092 G06N 3/08 G06N 7/01

Filing Date

2021-07-27

Application No.

18018421

Claims

View original document →