← USPTO Patent Grants

System and method for extracting and categorizing information from online sources

Grant US12585714B1 Kind: B1 Mar 24, 2026

Assignee

6SENSE INSIGHTS, INC.

Inventors

Ernest Kirubakaran Selvaraj, Samira Golsefid, Viral Tarun Bajaria, Satish Arjun Chilloji, Akshay Rajendra Shah, Amresh Sekar, Shubham Kumar Sunwalka

Abstract

A system and method for efficiently extracting and categorizing business information from online sources is disclosed. The system comprises a web crawler that obtains company domains from a database and collects depth-1 URLs from company homepages. A classification model, utilizing a fine-tuned BERT architecture, predicts which URLs contain relevant information for generating tags. A content extractor then extracts content from these predicted URLs using one or more modules. Finally, a large language model (LLM) processes the extracted content and generates tags using custom prompts designed for each tag category. These prompts are tailored to the nature of the extracted content, enhancing the context provided to the LLM. This multi-stage approach addresses challenges in processing large-scale, unstructured business data from diverse web sources, potentially offering improved efficiency, scalability, and accuracy in automated business intelligence gathering.

CPC Classifications

G06F 16/951 G06F 16/906 G06F 40/10 G06F 40/20 G06F 40/30 G06F 40/40 G06F 16/955 G06F 16/9566 G06N 5/04 G06N 5/045 G06N 20/00

Filing Date

2025-01-15

Application No.

19022441

Claims

10