Multimodal Retrieval Augmented Generation for Visually Rich Documents
Summary
USPTO published patent application US20260099698A1 filed by JPMorgan Chase Bank, N.A. The application covers multimodal retrieval augmented generation (RAG) methods for visually rich documents using a page-wise chunking algorithm. The system embeds text, spatial, and visual features from document pages into vectors, retrieves relevant chunks based on query similarity, and generates responses via a generative model.
What changed
USPTO published patent application US20260099698A1 assigned to JPMorgan Chase Bank, N.A. The application discloses methods and systems for multimodal retrieval augmented generation (RAG) that process visually rich documents using a page-wise chunking algorithm. The system generates vectors representing text, spatial, and visual features for each page chunk, retrieves relevant chunks based on query similarity, and generates responses using a generative model.
Patent application publications are informational filings that do not create compliance obligations for third parties. The publication notifies the public of the patent claim, allowing for prior art searches and opposition preparation. No regulatory action or compliance requirements are imposed by this document.
What to do next
- Monitor for updates
Archived snapshot
Apr 13, 2026GovPing captured this document from the original source. If the source has since changed or been removed, this is the text as it existed at that time.
SYSTEM AND METHOD FOR MULTIMODAL RETRIEVAL AUGMENTED GENERATION FOR VISUALLY RICH DOCUMENTS
Application US20260099698A1 Kind: A1 Apr 09, 2026
Assignee
JPMorgan Chase Bank, N.A.
Inventors
Simerjot KAUR, Zhiqiang MA, Mathieu SIBUE, Farima FARMAHINIFARAHANI, Lawrence YONG, Dongsheng WANG, Armineh NOURBAKHSH, Lucas CECCHI
Abstract
Various methods and processes, apparatuses/systems, and media for multimodal retrieval augmented generation for visually rich documents are disclosed. A processor implements a page-wise chunking algorithm to chunk a visual document into a plurality of page chunks; inputs the plurality of page chunks onto a trained embedding model; generates one vector for each page chunk. Each vector represents corresponding text, spatial and visual feature of each page of the visual document; inputs, in response to receiving a prompt of a query corresponding to the visual document, the vectors of the embedded chucks of pages retrieved from the database; identifies most relevant chunks of pages based on the similarities between the query prompt and chucks of pages; and generates, in response to inputting the most relevant chunks of pages onto a generative model, a response to the prompt corresponding to the visual document based on the identified most relevant chunks of pages.
CPC Classifications
G06N 3/0455 G06F 16/93 G06F 40/289 G06F 40/30
Filing Date
2024-10-04
Application No.
18906898
Related changes
Get daily alerts for USPTO Patent Applications - AI & Computing (G06N)
Daily digest delivered to your inbox.
Free. Unsubscribe anytime.
Source
About this page
Every important government, regulator, and court update from around the world. One place. Real-time. Free. Our mission
Source document text, dates, docket IDs, and authority are extracted directly from USPTO.
The summary, classification, recommended actions, deadlines, and penalty information are AI-generated from the original text and may contain errors. Always verify against the source document.
Classification
Who this affects
Taxonomy
Browse Categories
Get alerts for this source
We'll email you when USPTO Patent Applications - AI & Computing (G06N) publishes new changes.
Subscribed!
Optional. Filters your digest to exactly the updates that matter to you.