GUARDING MULTIMODAL ARTIFICIAL INTELLIGENCE SYSTEMS FROM MALICIOUS PROMPT ATTACKS

Application US20260087406A1 Kind: A1 Mar 26, 2026

Assignee

Microsoft Technology Licensing, LLC

Inventors

Reshmi GHOSH, Vitor Rocha De CARVALHO, Robert SIM, Emily LAWTON, Jack Wilson STOKES, Lukas WUTSCHITZ, Ahmed Mohamed Gamal SALEM, Xuefeng DU

Abstract

A data processing system implements obtaining a plurality of unlabeled user prompts including an unknown mixture of malicious prompts and benign prompts; analyzing each unlabeled user prompt using a multimodal vision language model to obtain embeddings representing each unlabeled user prompt; analyzing the embeddings to determine representation of each unlabeled user prompt of the plurality of unlabeled user prompts in a latent space; determining a first region of the latent space associated with benign user prompts and a second region of the latent space associated with malicious user prompts; generating labeled training data by labeling each unlabeled user prompt of the plurality of unlabeled user prompts with an indication whether each unlabeled user prompt is a benign user prompt falling with the first region or a malicious user prompt falling within the second region; and training a prompt classifier using the labeled training data.

CPC Classifications

G06N 20/00

Filing Date

2024-12-19

Application No.

18988604

View original document →