Load testing and performance benchmarking for large language models using a cloud computing platform
Assignee
MICROSOFT TECHNOLOGY LICENSING, LLC
Inventors
Sanjay Ramanujan, Rakesh Kelkar, Hari Krishnan Srinivasan, Karthik Raman, Hema Vishnu Pola, Sagar Taneja, Mradul Karmodiya
Abstract
The techniques disclosed herein enable systems to perform repeatable and iterative load testing and performance benchmarking for artificial intelligence models deployed in a cloud computing environment. This is achieved by utilizing load profiles and representative workloads generated based on the load profiles to evaluate an artificial intelligence model under various workload contexts. The representative workload is then executed by the artificial intelligence model utilizing available computing infrastructure. Performance metrics are extracted from the execution and analyzed to provide insight into various performance dynamics such as the relationship between latency and data throughput. In addition, load profiles and input datasets are dynamically adjusted to evaluate different scenarios and use cases enabling the system to automatically test the artificial intelligence model across diverse applications. Furthermore, by comparing various iterations of the artificial intelligence model, a quality gate can be constructed to enforce a consistent and high-quality user experience.
CPC Classifications
Filing Date
2022-10-27
Application No.
17975506
Claims
20