System and method for short text matching
Assignee
Intuit Inc.
Inventors
Aleksandr Kim, Rineke van Noort, Yuting Lu, Ben Yi
Abstract
A short text matching system and method includes pre-generated dictionary of n-gram tokens having a selected length and corresponding embeddings produced by a fine-tuned transformer model and further includes a one-layer transformer model for inference. The dictionary is produced by fine-tuning a pretrained transformer model based on a domain specific short text training dataset. The length of the n-gram tokens is selected based on the dependency of the variance of embeddings on the n-gram length for embeddings produced by the fine-tuned transformer model. Domain specific input text, including query text and target text, are received and n-gram tokens of the selected length are produced. Embeddings corresponding to each of the n-gram tokens are determined from the dictionary along with corresponding positional embeddings. The n-gram embeddings and positional embeddings are provided to the one-layer transformer model, which produces a text matching result, such as similarity score or classification.
CPC Classifications
Filing Date
2025-05-30
Application No.
19224535
Claims
20