Systems and methods for text-to-image generation using language models
Grant
US12585919B2
Kind: B2
Mar 24, 2026
Assignee
Salesforce, Inc.
Inventors
Ning Yu, Can Qin, Chen Xing, Shu Zhang, Stefano Ermon, Caiming Xiong, Ran Xu
Abstract
Embodiments described herein provide a mechanism for replacing existing text encoders in text-to-image generation models with more powerful pre-trained language models. Specifically, a translation network is trained to map features from the pre-trained language model output into the space of the target text encoder. The training preserves the rich structure of the pre-trained language model while allowing it to operate within the text-to-image generation model. The resulting modularized text-to-image model receives prompt and generates an image representing the features contained in the prompt.
CPC Classifications
G06N 3/0455
G06T 5/70
G06T 2207/20084
Filing Date
2023-01-31
Application No.
18162535
Claims
20