EXTRACTION OF CHEMICAL MOLECULES AND ASSOCIATED PROPERTIES FROM TEXT AND COMPLEX TABLES USING LLMs
Assignee
Tata Consultancy Services Limited
Inventors
ANKUR KRISHNA, TRINATH GADUPARTHI, ARPIT VISHWAKARMA
Abstract
Conventional models extract chemical data through querying tables by decomposing complex user queries. This disclosure relates generally to a method and system for extraction of chemical molecules and associated target properties from text and complex tables using Large Language Models (LLMs). The disclosed method extracts a plurality of molecular property values associated with the chemical molecules, from a plurality data sources, via generating a chemical composition schema utilizing LLMs. The chemical composition schema acts as a lens to view tabular information and a text. Relevant tables are identified and tabular information comprising chemical composition instances are extracted from a plurality of documents utilizing the chemical composition schema, along with LLMs and prompting techniques. The chemical composition instances are curated and reconciled into a unified knowledge graph, which is then used for querying. The disclosed method ensures precision in tabular information extraction without the need for extensive model training.
CPC Classifications
Filing Date
2025-09-04
Application No.
19319356