METHODS AND SYSTEMS FOR CURATING HIGH-QUALITY DATA SAMPLES TO ENHANCE LARGE LANGUAGE MODEL PERFORMANCE
Assignee
ACCENTRUE GLOBAL SOLUTIONS LIMITED
Inventors
Jinlong PANG, Jiaheng Wei, Ankit Parag Shanh, Yujia Bao, Yaxuan Wang, Wei Wei, Yang Liu, Chen Qian, Zhaowei Zhu
Abstract
Methods and systems for curating high-quality data samples to enhance Large Language Model (LLM) performance are disclosed. An input prompt corresponding to data samples of one or more datasets related to an enterprise is generated. Based on the input prompt, initial scores for the data samples are generated via implementation of one or more LLMs. Upon generating the input prompt, score curation is performed to correct score errors and to generate curated scores for the data samples. Further, diversity of the data samples is measured to generate long-tail scores for the data samples. The curated scores and the long-tail scores are utilized to determine the high-quality data samples from the data samples. The high-quality data samples are implemented to fine-tune a target LLM.
CPC Classifications
Filing Date
2025-09-30
Application No.
19346080