Optimizing low precision inference models for deployment of deep neural networks

ChangeBridge: Patent Grants - AI & Computing (G06N)

Published April 7th, 2026

Detected April 7th, 2026

Summary

The USPTO granted Patent US12596917B2 to Intel Corporation covering systems and methods for optimizing low precision inference models using asymmetric quantization in deep neural networks. The patent includes claims for per-input channel quantization and mixed-precision auto-tuning techniques. The patent names six inventors and contains 25 claims.

View original document View source feed page

What changed

The USPTO issued a patent grant (B2 kind code indicating second grant/reissue) to Intel Corporation for neural network quantization optimization technology. The invention provides methods for generating quantized neural networks with asymmetric quantization where model weights are signed integers and input layers use unsigned integers, along with weights accumulation tables and output restoration functions.

Technology companies developing AI inference models and manufacturers of AI accelerators/chips should monitor this patent portfolio. The 25 granted claims provide Intel exclusive rights to specific quantization optimization techniques that may be relevant to deploying deep neural networks in edge computing, data centers, or specialized AI hardware.

What to do next

Monitor for updates

Source document (simplified)

← USPTO Patent Grants

Optimizing low precision inference models for deployment of deep neural networks

Grant US12596917B2 Kind: B2 Apr 07, 2026

Assignee

Intel Corporation

Inventors

Jiong Gong, Yong Wu, Haihao Shen, Xiao Dong Lin, Guoming Zhang, Feng Yuan

Abstract

Systems, apparatuses and methods may provide technology for optimizing an inference neural network model that performs asymmetric quantization by generating a quantized neural network, wherein model weights of the neural network are quantized as signed integer values, and wherein an input layer of the neural network is configured to quantize input values as unsigned integer values, generating a weights accumulation table based on the quantized model weights and a kernel size for the neural network, and generating an output restoration function for an output layer of the neural network based on the weights accumulation table and the kernel size. The technology may also perform per-input channel quantization. The technology may also perform mixed-precision auto-tuning.