Hardening Machine Learning Models Against Prompt Input Attacks That Trigger Trojans
Summary
Sophos Limited has filed USPTO Application US20260111541A1 for a method that hardens machine learning models against prompt injection attacks. The method involves identifying Trojan triggers in neural networks by comparing neuron activity levels against a baseline in response to known test tokens, then selectively modifying neuron weights to suppress malicious responses below a threshold likelihood. The application names five inventors and was published April 23, 2026, after a December 30, 2024, filing date.
“The method further includes modifying the respective weights of one or more neurons in the subset of the neurons in the LLM such that, after the modifying, a likelihood that the LLM generates the resulting malicious response to the malicious prompt is below a threshold likelihood value.”
About this source
USPTO classification G06N covers computer systems based on specific computational models: neural networks, knowledge representation, fuzzy logic, expert systems, evolutionary algorithms. With the AI patent boom, this is one of the most-filed application classes in the office. Every newly published application in G06N lands in this feed, around 230 a month. Patent applications publish 18 months after filing, so this feed reveals what AI labs and companies were working on in the prior year and a half. Watch this if you compete in machine learning, file freedom-to-operate analyses, scout acquisition targets in AI infrastructure, or track which research groups are converting publications to patents. GovPing pulls each application with the filing number, title, applicant, and abstract.
What changed
Sophos Limited filed USPTO Patent Application US20260111541A1 for a method to harden pre-trained LLMs against prompt input attacks that trigger neural network Trojans. The method adjusts neuron weights to cause the LLM to generate a known malicious response to a test prompt, identifies a subset of neurons by comparing activity levels against a baseline, and modifies the weights of neurons in that subset to reduce the likelihood of a malicious response below a threshold value.
Technology companies developing or deploying LLMs should monitor this application as it describes a potential defensive methodology against adversarial prompt injection. The patent covers a testing-and-modification approach to identifying and neutralising Trojan triggers within neural network architectures, which may inform future security hardening practices for AI systems.
Archived snapshot
Apr 24, 2026GovPing captured this document from the original source. If the source has since changed or been removed, this is the text as it existed at that time.
Hardening Machine Learning Models Against Prompt Input Attacks That Trigger Trojans
Application US20260111541A1 Kind: A1 Apr 23, 2026
Assignee
Sophos Limited
Inventors
Tamás Vörös, Sean Paul Bergeron, Ben Uri Gelman, Adarsh Dinesh Kyadige, Tamas Bence Nyiri
Abstract
A method includes obtaining a pre-trained LLM that generates a resulting malicious response to a malicious prompt input to the LLM. The method further includes adjusting a respective weight of neurons of the LLM to cause the LLM to generate a known malicious response to a test prompt input to the LLM, where the test prompt includes a plurality of known test tokens. The method further includes identifying a subset of the neurons based on comparing a respective activity level of each neuron in response to the test prompt with a baseline activity level. The method further includes modifying the respective weights of one or more neurons in the subset of the neurons in the LLM such that, after the modifying, a likelihood that the LLM generates the resulting malicious response to the malicious prompt is below a threshold likelihood value.
CPC Classifications
G06F 21/554 G06N 3/0475 G06N 3/094 G06F 2221/033
Filing Date
2024-12-30
Application No.
19005933
Parties
Related changes
Get daily alerts for USPTO Patent Applications - AI & Computing (G06N)
Daily digest delivered to your inbox.
Free. Unsubscribe anytime.
Source
About this page
Every important government, regulator, and court update from around the world. One place. Real-time. Free. Our mission
Source document text, dates, docket IDs, and authority are extracted directly from USPTO.
The summary, classification, recommended actions, deadlines, and penalty information are AI-generated from the original text and may contain errors. Always verify against the source document.
Classification
Who this affects
Taxonomy
Browse Categories
Get alerts for this source
We'll email you when USPTO Patent Applications - AI & Computing (G06N) publishes new changes.
Subscribed!
Optional. Filters your digest to exactly the updates that matter to you.