SYSTEM AND METHOD FOR PARALLELIZING LORAS BY MAXIMIZING GPU UTILIZATION
Application
US20260080276A1
Kind: A1
Mar 19, 2026
Inventors
Asser Mazin, Mohamed Hatem
Abstract
One example method includes receiving multiple LoRA (low rank adaptor) models, batching the LoRA models together to generate one or more batches of the LoRA models, creating a respective queue for each of the batches of the LoRA models, calling the LoRA models in a sequence in which the LoRA models were batched, and using only a single GPU (graphics processing unit), performing simultaneous parallel inferencing on all of the LoRA models.
CPC Classifications
G06N 5/04
Filing Date
2024-09-18
Application No.
18889195