DATASET CREATION FOR LARGE LANGUAGE MODEL FINETUNING
Inventors
Omkar Anil Gune, Rushikesh Khilare
Abstract
A general large language model (LLM) processes a prompt on a current region including a set of chunks of a domain specific source document to generate a question for the current region and an answer for the question. The current region is expanded by incorporating adjacent chunks to the set of chunks. The general LLM processes the question and the current region to revise the answer and identify a set of second answer chunks in the current region used to revise the answer. The operations include generating a question vector embedding for the question and performing retrieval augmentation matching with the question vector embedding and chunk vector embeddings of the chunks. The general LLM processes the question and the current region to revise the answer and identify a set of third answer chunks used to revise the answer. The domain specific LLM is updated with the question and answer.
CPC Classifications
Filing Date
2025-08-15
Application No.
19301403