SYSTEM AND METHOD FOR PARALLELIZING LORAS BY MAXIMIZING GPU UTILIZATION

Application US20260080276A1 Kind: A1 Mar 19, 2026

Inventors

Asser Mazin, Mohamed Hatem

Abstract

One example method includes receiving multiple LoRA (low rank adaptor) models, batching the LoRA models together to generate one or more batches of the LoRA models, creating a respective queue for each of the batches of the LoRA models, calling the LoRA models in a sequence in which the LoRA models were batched, and using only a single GPU (graphics processing unit), performing simultaneous parallel inferencing on all of the LoRA models.

CPC Classifications

G06N 5/04

Filing Date

2024-09-18

Application No.

18889195

View original document →