Load testing and performance benchmarking for large language models using a cloud computing platform

Grant US12585507B2 Kind: B2 Mar 24, 2026

Assignee

MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors

Sanjay Ramanujan, Rakesh Kelkar, Hari Krishnan Srinivasan, Karthik Raman, Hema Vishnu Pola, Sagar Taneja, Mradul Karmodiya

Abstract

The techniques disclosed herein enable systems to perform repeatable and iterative load testing and performance benchmarking for artificial intelligence models deployed in a cloud computing environment. This is achieved by utilizing load profiles and representative workloads generated based on the load profiles to evaluate an artificial intelligence model under various workload contexts. The representative workload is then executed by the artificial intelligence model utilizing available computing infrastructure. Performance metrics are extracted from the execution and analyzed to provide insight into various performance dynamics such as the relationship between latency and data throughput. In addition, load profiles and input datasets are dynamically adjusted to evaluate different scenarios and use cases enabling the system to automatically test the artificial intelligence model across diverse applications. Furthermore, by comparing various iterations of the artificial intelligence model, a quality gate can be constructed to enforce a consistent and high-quality user experience.

CPC Classifications

G06F 9/5077 G06F 9/505 G06F 2209/501 G06F 2209/5019 G06F 2209/508 G06F 11/3457 G06F 11/3414 G06N 3/0455

Filing Date

2022-10-27

Application No.

17975506

Claims

View original document →