Off-line learning for robot control using a reward prediction model
Assignee
GDM Holding LLC
Inventors
Konrad Zolna, Scott Ellison Reed
Abstract
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for off-line learning using a reward prediction model. One of the methods includes obtaining robot experience data; training, on a first subset of the robot experience data, a reward prediction model that receives a reward input comprising an input observation and generates as output a reward prediction that is a prediction of a task-specific reward for the particular task that should be assigned to the input observation; processing experiences in the robot experience data using the trained reward prediction model to generate a respective reward prediction for each of the processed experiences; and training a policy neural network on (i) the processed experiences and (ii) the respective reward predictions for the processed experiences.
CPC Classifications
Filing Date
2021-07-27
Application No.
18018421
Claims
20