Knowledge distillation in multi-arm bandit, neural network models for real-time online optimization

Grant US12591789B2 Kind: B2 Mar 31, 2026

Assignee

Indeed, Inc.

Inventors

Ziying Liu, Haiyan Luo, Jianjie Ma, Yu Sun

Abstract

A knowledge distillation system and method trains neural networks utilizing a non-conventional replay buffer and augmented data tuples. In at least one embodiment, the knowledge distillation system and method pretrain a teacher model that implements a contextual bandit algorithm. A lightweight student model determines online contextual bandit data tuples as to context x, arm/action a, and reward/payoff r. The data tuples are stored in a replay buffer. The teacher model randomly samples data tuples from the replay buffer and augments the sampled data tuples. Augmented data tuples are stored in the replay buffer. The student model batch processes augmented data tuples to update parameters of contextual bandit data tuples.

CPC Classifications

G06N 5/022 G06N 3/045 G06N 3/047 G06N 3/08

Filing Date

2021-05-26

Application No.

17331475

Claims

View original document →