Adversarial attack detection based on similarity
Assignee
HiddenLayer, Inc.
Inventors
Julian Collado Umana, Andrew Davis
Abstract
A query is received which is to be input into a machine learning model (or other artificial intelligence model). Thereafter, a plurality of historical queries of the machine learning model meeting first criteria relative to the query is determined using a first distance-based similarity analysis technique. Each of the historical queries have a known output by the machine learning model. An output of the machine learning model responsive to query is received. Next, it is determined, using a second distance-based similarity analysis technique, whether the output meets second criteria relative to each of the known outputs corresponding to the historical queries. This determination characterizes whether the query is likely to cause the machine learning model to behave in an undesired manner and can be provided to a consuming application or process. Related apparatus, systems, and techniques are also described.
CPC Classifications
Filing Date
2025-07-15
Application No.
19270337
Claims
30