CURATION OF A TRAINING DATASET OF A MACHINE LEARNING MODEL
Application
US20260080240A1
Kind: A1
Mar 19, 2026
Inventors
ABEDELKADER ASI, SHAHAR KEREN, OMER LUXEMBOURG
Abstract
The training dataset of a machine learning model is curated to eliminate redundant training samples from a supervised training dataset. The training samples are grouped into classes. An embedding of each training sample is used to search for pairs of training samples within a class having closely-matching embeddings. One training sample of the pair is eliminated. The search uses an approximate nearest neighbor search to find the redundant pairs. A curation process reduces the size of the training dataset to a user-defined removal rate or until the spread of the distribution of the training samples in each class and between classes meets a desired threshold.
CPC Classifications
G06N 3/08
Filing Date
2024-09-18
Application No.
18889291