← USPTO Patent Applications

METHODS AND SYSTEMS FOR CURATING HIGH-QUALITY DATA SAMPLES TO ENHANCE LARGE LANGUAGE MODEL PERFORMANCE

Application US20260093981A1 Kind: A1 Apr 02, 2026

Assignee

ACCENTRUE GLOBAL SOLUTIONS LIMITED

Inventors

Jinlong PANG, Jiaheng Wei, Ankit Parag Shanh, Yujia Bao, Yaxuan Wang, Wei Wei, Yang Liu, Chen Qian, Zhaowei Zhu

Abstract

Methods and systems for curating high-quality data samples to enhance Large Language Model (LLM) performance are disclosed. An input prompt corresponding to data samples of one or more datasets related to an enterprise is generated. Based on the input prompt, initial scores for the data samples are generated via implementation of one or more LLMs. Upon generating the input prompt, score curation is performed to correct score errors and to generate curated scores for the data samples. Further, diversity of the data samples is measured to generate long-tail scores for the data samples. The curated scores and the long-tail scores are utilized to determine the high-quality data samples from the data samples. The high-quality data samples are implemented to fine-tune a target LLM.

CPC Classifications

G06N 3/08 G06N 3/0475

Filing Date

2025-09-30

Application No.

19346080