USPTO Patent for System Extracting and Categorizing Online Business Information
Summary
The USPTO has granted a patent (US12585714B1) to 6SENSE INSIGHTS, INC. for a system and method to extract and categorize online business information. The system uses a fine-tuned BERT model for URL classification and an LLM for content processing and tag generation.
What changed
The United States Patent and Trademark Office (USPTO) has issued patent US12585714B1 to 6SENSE INSIGHTS, INC. The patent covers a system and method for extracting and categorizing online business information. Key components include a web crawler, a classification model based on a fine-tuned BERT architecture to identify relevant URLs, a content extractor, and a large language model (LLM) that processes extracted content to generate tags using custom prompts. This technology aims to improve the efficiency, scalability, and accuracy of automated business intelligence gathering from diverse online sources.
This patent grant is primarily an intellectual property matter and does not impose direct regulatory obligations on businesses. However, companies developing or utilizing similar AI-driven information extraction and categorization technologies should be aware of this patent. It may impact their freedom to operate or necessitate licensing agreements if their systems infringe upon the patented claims. The filing date was January 15, 2025, and the patent was granted on March 24, 2026.
Source document (simplified)
System and method for extracting and categorizing information from online sources
Grant US12585714B1 Kind: B1 Mar 24, 2026
Assignee
6SENSE INSIGHTS, INC.
Inventors
Ernest Kirubakaran Selvaraj, Samira Golsefid, Viral Tarun Bajaria, Satish Arjun Chilloji, Akshay Rajendra Shah, Amresh Sekar, Shubham Kumar Sunwalka
Abstract
A system and method for efficiently extracting and categorizing business information from online sources is disclosed. The system comprises a web crawler that obtains company domains from a database and collects depth-1 URLs from company homepages. A classification model, utilizing a fine-tuned BERT architecture, predicts which URLs contain relevant information for generating tags. A content extractor then extracts content from these predicted URLs using one or more modules. Finally, a large language model (LLM) processes the extracted content and generates tags using custom prompts designed for each tag category. These prompts are tailored to the nature of the extracted content, enhancing the context provided to the LLM. This multi-stage approach addresses challenges in processing large-scale, unstructured business data from diverse web sources, potentially offering improved efficiency, scalability, and accuracy in automated business intelligence gathering.
CPC Classifications
G06F 16/951 G06F 16/906 G06F 40/10 G06F 40/20 G06F 40/30 G06F 40/40 G06F 16/955 G06F 16/9566 G06N 5/04 G06N 5/045 G06N 20/00
Filing Date
2025-01-15
Application No.
19022441
Claims
10
Related changes
Source
Classification
Who this affects
Taxonomy
Browse Categories
Get Telecom & Technology alerts
Weekly digest. AI-summarized, no noise.
Free. Unsubscribe anytime.
Get alerts for this source
We'll email you when ChangeBridge: Patent Grants - AI & Computing (G06N) publishes new changes.