DATASET CREATION FOR LARGE LANGUAGE MODEL FINETUNING

Application US20260080261A1 Kind: A1 Mar 19, 2026

Inventors

Omkar Anil Gune, Rushikesh Khilare

Abstract

A general large language model (LLM) processes a prompt on a current region including a set of chunks of a domain specific source document to generate a question for the current region and an answer for the question. The current region is expanded by incorporating adjacent chunks to the set of chunks. The general LLM processes the question and the current region to revise the answer and identify a set of second answer chunks in the current region used to revise the answer. The operations include generating a question vector embedding for the question and performing retrieval augmentation matching with the question vector embedding and chunk vector embeddings of the chunks. The general LLM processes the question and the current region to revise the answer and identify a set of third answer chunks used to revise the answer. The domain specific LLM is updated with the question and answer.

CPC Classifications

G06N 3/096 G06N 3/042

Filing Date

2025-08-15

Application No.

19301403

View original document →