Synthetic Corruption of Machine Learning Output
Summary
USPTO published patent application US20260099721A1 for a system and method enabling a corrupter module to synthesize qualified corrupt data for training machine learning safeguard models. The system receives first output data from a large language model, identifies entity-to-concept mappings in a domain ontology, and generates training data by replacing entities with alternative entities mapped to different ontological concepts that comply with predefined corruption rules. The trained safeguard model is configured to detect errors in subsequent language model outputs.
What changed
USPTO published patent application US20260099721A1 titled 'Synthetic Corruption of Machine Learning Output' on April 9, 2026. The application discloses methods for training safeguard models to detect errors in large language model outputs using synthetically generated corrupt data. The system identifies entity-concept mappings in domain ontologies and replaces entities with alternatives mapped to different concepts to create training examples.
Affected parties include developers and manufacturers of large language models, AI systems requiring output validation safeguards, and organizations building quality assurance mechanisms for AI applications. This patent represents a technical advance in machine learning robustness testing methodology and could be relevant to any entity developing or deploying LLM-based systems requiring error detection capabilities.
Archived snapshot
Apr 18, 2026GovPing captured this document from the original source. If the source has since changed or been removed, this is the text as it existed at that time.
SYNTHETIC CORRUPTION OF MACHINE LEARNING OUTPUT
Application US20260099721A1 Kind: A1 Apr 09, 2026
Inventors
Rachel WITIES, Aaron BORNSTEIN, Hadas BITRAN, Ran EFRATI
Abstract
A corrupter may receive first output data of a designated domain from the large language model. The corrupter may synthesize qualified corrupt data for training the safeguard model configured to detect errors in second output of the large language model by: identifying a mapping of a first entity of the first output data to a first concept in an ontology corresponding to the designated domain, and generating the qualified corrupt data by replacing the first entity in the first output data with a second entity, wherein the second entity is mapped to a second concept of the ontology that complies with predefined corruption rule relative to the first concept of the ontology.
CPC Classifications
G06N 3/094 G06N 3/0475
Filing Date
2024-10-31
Application No.
18933073
Related changes
Get daily alerts for USPTO Patent Applications - AI & Computing (G06N)
Daily digest delivered to your inbox.
Free. Unsubscribe anytime.
Source
About this page
Every important government, regulator, and court update from around the world. One place. Real-time. Free. Our mission
Source document text, dates, docket IDs, and authority are extracted directly from USPTO.
The summary, classification, recommended actions, deadlines, and penalty information are AI-generated from the original text and may contain errors. Always verify against the source document.
Classification
Who this affects
Taxonomy
Browse Categories
Get alerts for this source
We'll email you when USPTO Patent Applications - AI & Computing (G06N) publishes new changes.
Subscribed!
Optional. Filters your digest to exactly the updates that matter to you.