← USPTO Patent Grants

System and method for robust natural language classification under character encoding

Grant US12585765B2 Kind: B2 Mar 24, 2026

Assignee

Barracuda Networks, Inc.

Inventors

Christopher L. Sawtelle

Abstract

A new approach is proposed to support robust natural language classification under character encoding. A plurality of images that represent a plurality of characters under various language encoding schemes for a target language character are accepted and utilized to create a distribution of text similarity probabilities for the plurality of characters likely to be swapped/replaced/substituted with the target language character to trick a human user. The distribution of text similarity probabilities is then applied against a true text corpus comprising a set of real/actual texts to generate a synthetic text corpus that further includes a set of characters being swapped with one or more of the plurality of characters based on the distribution of text similarity probabilities. The synthetic text corpus is then utilized to train one or more NLP models, which are then utilized to correctly classify and recognize an incoming electronic message that contains a character swap attack.

CPC Classifications

G06F 21/554 G06F 40/126 G06F 40/279 G06F 2221/034 G06N 20/00

Filing Date

2023-09-22

Application No.

18371878

Claims

17