Knowledge Distillation Patent - Multi-Arm Bandit Neural Networks for Real-Time Optimization

ChangeBridge: Patent Grants - AI & Computing (G06N)

Published March 31st, 2026

Detected March 31st, 2026

Summary

The USPTO granted patent US12591789B2 to Indeed, Inc. covering knowledge distillation techniques for training neural networks using multi-arm bandit algorithms with replay buffers and augmented data tuples. The patent application (17331475) was filed on May 26, 2021, and includes 20 claims covering the teacher-student model architecture for contextual bandit optimization. This patent grant establishes intellectual property rights for the company's AI-driven optimization technology.

View original document View source feed page

What changed

USPTO granted patent US12591789B2 to Indeed, Inc. for a knowledge distillation system that trains neural networks using contextual bandit algorithms. The invention implements a teacher model that pre-trains on contextual bandit data tuples (context x, action a, reward r), stores samples in a replay buffer, augments the data, and uses a lightweight student model to batch process the augmented tuples for parameter updates. The patent covers CPC classifications G06N 5/022, G06N 3/045, G06N 3/047, and G06N 3/08.

This is a patent grant notice that establishes intellectual property rights but does not impose compliance obligations on third parties. Technology companies developing similar multi-arm bandit or knowledge distillation systems should review the patent claims to assess potential licensing needs or design-around considerations. No immediate regulatory action is required.

Source document (simplified)

← USPTO Patent Grants

Knowledge distillation in multi-arm bandit, neural network models for real-time online optimization

Grant US12591789B2 Kind: B2 Mar 31, 2026

Assignee

Indeed, Inc.

Inventors

Ziying Liu, Haiyan Luo, Jianjie Ma, Yu Sun

Abstract

A knowledge distillation system and method trains neural networks utilizing a non-conventional replay buffer and augmented data tuples. In at least one embodiment, the knowledge distillation system and method pretrain a teacher model that implements a contextual bandit algorithm. A lightweight student model determines online contextual bandit data tuples as to context x, arm/action a, and reward/payoff r. The data tuples are stored in a replay buffer. The teacher model randomly samples data tuples from the replay buffer and augments the sampled data tuples. Augmented data tuples are stored in the replay buffer. The student model batch processes augmented data tuples to update parameters of contextual bandit data tuples.