Layered gradient accumulation and modular pipeline parallelism for improved training of machine learning models

Grant US12585929B2 Kind: B2 Mar 24, 2026

Assignee

ServiceNow, Inc.

Inventors

Joel Lamy-Poirier

Abstract

A method is provided including: (i) assigning sequentially-ordered layers of a machine learning model to a plurality of compute nodes, each of the layers being assigned to exactly one of the nodes; (ii) dividing training data into micro-batches; (iii) forward-propagating the micro-batches through the model, each node operating in parallel to generate respective activation states for the micro-batches with their assigned layers, and with the activation states being communicated between the nodes according to the layers' sequential ordering; and (iv) backward-propagating the micro-batches through the model, each node operating in parallel to generate respective error states for the micro-batches with their assigned layers, with the error states being communicated between the nodes according to the layers' reverse sequential ordering, wherein each of the nodes completes the backward-propagation of all micro-batches through a given layer prior to performing backward-propagation through any layer that precedes the given layer in the sequential ordering.

CPC Classifications

G06F 18/2155 G06N 3/084 G06N 3/063

Filing Date

2022-02-09

Application No.

17668200

Claims

View original document →