Knowledge Distillation Patent - Multi-Arm Bandit Neural Networks for Real-Time Optimization
Summary
The USPTO granted patent US12591789B2 to Indeed, Inc. covering knowledge distillation techniques for training neural networks using multi-arm bandit algorithms with replay buffers and augmented data tuples. The patent application (17331475) was filed on May 26, 2021, and includes 20 claims covering the teacher-student model architecture for contextual bandit optimization. This patent grant establishes intellectual property rights for the company's AI-driven optimization technology.
What changed
USPTO granted patent US12591789B2 to Indeed, Inc. for a knowledge distillation system that trains neural networks using contextual bandit algorithms. The invention implements a teacher model that pre-trains on contextual bandit data tuples (context x, action a, reward r), stores samples in a replay buffer, augments the data, and uses a lightweight student model to batch process the augmented tuples for parameter updates. The patent covers CPC classifications G06N 5/022, G06N 3/045, G06N 3/047, and G06N 3/08.
This is a patent grant notice that establishes intellectual property rights but does not impose compliance obligations on third parties. Technology companies developing similar multi-arm bandit or knowledge distillation systems should review the patent claims to assess potential licensing needs or design-around considerations. No immediate regulatory action is required.
Source document (simplified)
Knowledge distillation in multi-arm bandit, neural network models for real-time online optimization
Grant US12591789B2 Kind: B2 Mar 31, 2026
Assignee
Indeed, Inc.
Inventors
Ziying Liu, Haiyan Luo, Jianjie Ma, Yu Sun
Abstract
A knowledge distillation system and method trains neural networks utilizing a non-conventional replay buffer and augmented data tuples. In at least one embodiment, the knowledge distillation system and method pretrain a teacher model that implements a contextual bandit algorithm. A lightweight student model determines online contextual bandit data tuples as to context x, arm/action a, and reward/payoff r. The data tuples are stored in a replay buffer. The teacher model randomly samples data tuples from the replay buffer and augments the sampled data tuples. Augmented data tuples are stored in the replay buffer. The student model batch processes augmented data tuples to update parameters of contextual bandit data tuples.
CPC Classifications
G06N 5/022 G06N 3/045 G06N 3/047 G06N 3/08
Filing Date
2021-05-26
Application No.
17331475
Claims
20
Named provisions
Related changes
Source
Classification
Who this affects
Taxonomy
Browse Categories
Get Telecom & Technology alerts
Weekly digest. AI-summarized, no noise.
Free. Unsubscribe anytime.
Get alerts for this source
We'll email you when ChangeBridge: Patent Grants - AI & Computing (G06N) publishes new changes.