← USPTO Patent Applications

ADAPTIVE DOCUMENT CONTENT EXTRACTION VIA ENTROPY-GUIDED GLOBAL ALIGNMENT

Application US20260080704A1 Kind: A1 Mar 19, 2026

Assignee

Richard Hermann

Inventors

Richard Hermann

Abstract

A system and method for extracting content from electronic documents, addressing limitations of rigid, template-based approaches and overfitting issues of machine learning approaches are disclosed. The method begins by identifying and ranking content features by Shannon entropy. The highest-ranked feature(s) are used to identify and match “Landmarks”—content that serves as distinct global anchor points for establishing global alignment between documents. With these Landmarks as a foundation, an adaptive, stepwise global alignment process matches the remaining content. This process uses a two-stage technique: deterministic features first identify a set of potential candidate matches, and then non-deterministic spatial features select the single best match from the candidates based on its geometric coherence with already-aligned items. In the final stage, LLMs are selectively employed to generalize the discovered features and relationships into reusable, abstracted prompts. This allows the system to adapt to unseen document formats with higher accuracy than brute force prompting.

CPC Classifications

G06V 30/41 G06N 5/022 G06V 30/418

Filing Date

2025-09-15

Application No.

19328817