Training a neural network having sparsely-activated sub-networks using regularization
Assignee
Microsoft Technology Licensing, LLC
Inventors
Jian Jiao, Xiaodong Liu, Jianfeng Gao, Ruofei Zhang
Abstract
A training technique trains a neural network having sparsely-activated sub-networks. It does so by processing plural batches of training data in two respective passes of the neural network, yielding first prediction information and second prediction information. For each batch, the technique randomly assigns different sub-networks in the first and second passes of the neural network to process the batch. Over the course of training, the technique attempts to minimize loss information, which describes the difference between the first prediction information and ground-truth information, and the difference between the second prediction information and the ground-truth information. Simultaneously, the technique attempts to minimize divergence information, which describes the divergence of the first prediction information from the second prediction information (and vice versa). The technique can produce an inference-stage model by arbitrarily selecting at least one of the trained sub-networks in the neural network, for use in a production system.
CPC Classifications
Filing Date
2021-10-11
Application No.
17498737
Claims
19