Knowledge distillation in multi-arm bandit, neural network models for real-time online optimization
Assignee
Indeed, Inc.
Inventors
Ziying Liu, Haiyan Luo, Jianjie Ma, Yu Sun
Abstract
A knowledge distillation system and method trains neural networks utilizing a non-conventional replay buffer and augmented data tuples. In at least one embodiment, the knowledge distillation system and method pretrain a teacher model that implements a contextual bandit algorithm. A lightweight student model determines online contextual bandit data tuples as to context x, arm/action a, and reward/payoff r. The data tuples are stored in a replay buffer. The teacher model randomly samples data tuples from the replay buffer and augments the sampled data tuples. Augmented data tuples are stored in the replay buffer. The student model batch processes augmented data tuples to update parameters of contextual bandit data tuples.
CPC Classifications
Filing Date
2021-05-26
Application No.
17331475
Claims
20