Systems and methods for text-to-image generation using language models

Grant US12585919B2 Kind: B2 Mar 24, 2026

Assignee

Salesforce, Inc.

Inventors

Ning Yu, Can Qin, Chen Xing, Shu Zhang, Stefano Ermon, Caiming Xiong, Ran Xu

Abstract

Embodiments described herein provide a mechanism for replacing existing text encoders in text-to-image generation models with more powerful pre-trained language models. Specifically, a translation network is trained to map features from the pre-trained language model output into the space of the target text encoder. The training preserves the rich structure of the pre-trained language model while allowing it to operate within the text-to-image generation model. The resulting modularized text-to-image model receives prompt and generates an image representing the features contained in the prompt.

CPC Classifications

G06N 3/0455 G06T 5/70 G06T 2207/20084

Filing Date

2023-01-31

Application No.

18162535

Claims

View original document →