← USPTO Patent Applications

LARGE LANGUAGE MODELS FOR NL2SQL WITH LONG CONTEXT FINETUNING

Application US20260080260A1 Kind: A1 Mar 19, 2026

Assignee

Oracle International Corporation

Inventors

Dai Quoc Nguyen, Cong Duy Vu Hoang, Duy Vu, Gioacchino Tangari, Steve Wai-Chun Siu, Dalu Guo, Budhaditya Saha, Thanh Tien Vu, Yakupitiyage Don Thanuja Samodhye Dharmasiri, Thanh Long Duong, Anshuk Pal Chaudhuri, Prabhakara Reddy Munnangi, Subash Kumar Bhamidipati

Abstract

The present disclosure relates to manufacturing training and testing data by leveraging data augmentation techniques to generate examples of long context database schemas. Aspects are directed towards accessing a training dataset comprising training examples where each training example may include i) a prompt including a natural language utterance and a database schema having one or more tables, and ii) a gold logical form corresponding to the natural language utterance, combining the tables from the database schemas in the training examples may generate a combined database schema set, generating a set of long context training examples based on the training dataset and the combined database schema set, and incorporating the long context database schema into the selected training example to generate a long context training example to train a generative artificial intelligence model with at least the set of long context training examples to generate a trained generative artificial intelligence model.

CPC Classifications

G06N 3/096 G06F 16/243 G06N 3/0475

Filing Date

2025-01-23

Application No.

19035561