DNN Inference Optimization Using Practical Early Exit Networks

ChangeBridge: Patent Apps - AI & Computing (G06N)

Published November 25th, 2025

Detected April 1st, 2026

Summary

USPTO published patent application US20260086912A1 disclosing methods and systems for optimizing DNN inference using early exit networks. The invention enables dynamic splitting of machine learning models based on processing load forecasts and adaptive batch sizing to improve computational efficiency. Application No. 19400394 was filed November 25, 2025.

View original document View source feed page

What changed

The patent application discloses methods for optimizing deep neural network inference through practical early exit networks. The system receives load forecasts for processing requests and dynamically splits the ML model into multiple portions based on that forecast. Batch sizes are determined for each model portion, and available computational resources are allocated to execute the portions and generate inferences efficiently.

This is a patent application publication with no immediate compliance obligations. Technology companies developing ML inference systems, cloud computing platforms, or edge AI devices may find this relevant for understanding prior art in optimization techniques. No regulatory deadlines, penalties, or required actions apply. The application remains pending until examined and potentially granted by USPTO.

Source document (simplified)

← USPTO Patent Applications

DEEP NEURAL NETWORKS (DNN) INFERENCE USING PRACTICAL EARLY EXIT NETWORKS

Application US20260086912A1 Kind: A1 Mar 26, 2026

Inventors

Anand PADMANABHA IYER, Swapnil Sunilkumar GANDHI

Abstract

The present disclosure relates to methods and systems for providing inferences using machine learning systems. The methods and systems receive a load forecast for processing requests by a machine learning model and split the machine learning model into a plurality machine learning model portions based on the load forecast. The methods and systems determine a batch size for the requests for the machine learning model portions. The methods and systems use one or more available resources to execute the plurality of machine learning model portions to process the requests and generate inferences for the requests.