← USPTO Patent Applications

CURATION OF A TRAINING DATASET OF A MACHINE LEARNING MODEL

Application US20260080240A1 Kind: A1 Mar 19, 2026

Inventors

ABEDELKADER ASI, SHAHAR KEREN, OMER LUXEMBOURG

Abstract

The training dataset of a machine learning model is curated to eliminate redundant training samples from a supervised training dataset. The training samples are grouped into classes. An embedding of each training sample is used to search for pairs of training samples within a class having closely-matching embeddings. One training sample of the pair is eliminated. The search uses an approximate nearest neighbor search to find the redundant pairs. A curation process reduces the size of the training dataset to a user-defined removal rate or until the spread of the distribution of the training samples in each class and between classes meets a desired threshold.

CPC Classifications

G06N 3/08

Filing Date

2024-09-18

Application No.

18889291