Training a neural network having sparsely-activated sub-networks using regularization

Grant US12579426B2 Kind: B2 Mar 17, 2026

Assignee

Microsoft Technology Licensing, LLC

Inventors

Jian Jiao, Xiaodong Liu, Jianfeng Gao, Ruofei Zhang

Abstract

A training technique trains a neural network having sparsely-activated sub-networks. It does so by processing plural batches of training data in two respective passes of the neural network, yielding first prediction information and second prediction information. For each batch, the technique randomly assigns different sub-networks in the first and second passes of the neural network to process the batch. Over the course of training, the technique attempts to minimize loss information, which describes the difference between the first prediction information and ground-truth information, and the difference between the second prediction information and the ground-truth information. Simultaneously, the technique attempts to minimize divergence information, which describes the divergence of the first prediction information from the second prediction information (and vice versa). The technique can produce an inference-stage model by arbitrarily selecting at least one of the trained sub-networks in the neural network, for use in a production system.

CPC Classifications

G06F 18/285 G06N 3/08 G06N 3/045 G06N 3/0499 G06N 3/082 G06N 3/084 G06N 3/0455

Filing Date

2021-10-11

Application No.

17498737

Claims

View original document →