Optimizing low precision inference models for deployment of deep neural networks

Grant US12596917B2 Kind: B2 Apr 07, 2026

Assignee

Intel Corporation

Inventors

Jiong Gong, Yong Wu, Haihao Shen, Xiao Dong Lin, Guoming Zhang, Feng Yuan

Abstract

Systems, apparatuses and methods may provide technology for optimizing an inference neural network model that performs asymmetric quantization by generating a quantized neural network, wherein model weights of the neural network are quantized as signed integer values, and wherein an input layer of the neural network is configured to quantize input values as unsigned integer values, generating a weights accumulation table based on the quantized model weights and a kernel size for the neural network, and generating an output restoration function for an output layer of the neural network based on the weights accumulation table and the kernel size. The technology may also perform per-input channel quantization. The technology may also perform mixed-precision auto-tuning.

CPC Classifications

G06N 3/0495 G06N 3/08 G06N 3/045 G06N 3/063

Filing Date

2020-03-13

Application No.

17929023

Claims

View original document →