DEEP NEURAL NETWORKS (DNN) INFERENCE USING PRACTICAL EARLY EXIT NETWORKS
Application
US20260086912A1
Kind: A1
Mar 26, 2026
Inventors
Anand PADMANABHA IYER, Swapnil Sunilkumar GANDHI
Abstract
The present disclosure relates to methods and systems for providing inferences using machine learning systems. The methods and systems receive a load forecast for processing requests by a machine learning model and split the machine learning model into a plurality machine learning model portions based on the load forecast. The methods and systems determine a batch size for the requests for the machine learning model portions. The methods and systems use one or more available resources to execute the plurality of machine learning model portions to process the requests and generate inferences for the requests.
CPC Classifications
G06F 11/3442
G06F 9/505
G06N 5/043
G06N 20/00
Filing Date
2025-11-25
Application No.
19400394