USPTO Patent Grant: Layered Gradient Accumulation for Machine Learning

ChangeBridge: Patent Grants - AI & Computing (G06N)

Published March 24th, 2026

Detected March 25th, 2026

Summary

The USPTO has granted patent US12585929B2 to ServiceNow, Inc. for a method of layered gradient accumulation and modular pipeline parallelism to improve machine learning model training. The patent details a technique for distributing model layers across compute nodes and optimizing data flow during training.

View original document View source feed page

What changed

The United States Patent and Trademark Office (USPTO) has granted patent US12585929B2 to ServiceNow, Inc. The patent covers a novel method for training machine learning models, specifically focusing on layered gradient accumulation and modular pipeline parallelism. The described technique involves assigning sequential layers of a machine learning model to multiple compute nodes, dividing training data into micro-batches, and optimizing the forward and backward propagation processes through parallel operations and inter-node communication.

This patent grant is primarily of interest to technology companies and researchers involved in AI and machine learning development. While it does not impose direct regulatory obligations on businesses, it represents a significant intellectual property development in the field of AI model training. Companies developing or utilizing similar training methodologies should be aware of this patent to ensure they do not infringe on the granted claims. No immediate compliance actions are required, but it may influence future R&D strategies and licensing considerations.

Source document (simplified)

← USPTO Patent Grants

Layered gradient accumulation and modular pipeline parallelism for improved training of machine learning models

Grant US12585929B2 Kind: B2 Mar 24, 2026

Assignee

ServiceNow, Inc.

Inventors

Joel Lamy-Poirier

Abstract

A method is provided including: (i) assigning sequentially-ordered layers of a machine learning model to a plurality of compute nodes, each of the layers being assigned to exactly one of the nodes; (ii) dividing training data into micro-batches; (iii) forward-propagating the micro-batches through the model, each node operating in parallel to generate respective activation states for the micro-batches with their assigned layers, and with the activation states being communicated between the nodes according to the layers' sequential ordering; and (iv) backward-propagating the micro-batches through the model, each node operating in parallel to generate respective error states for the micro-batches with their assigned layers, with the error states being communicated between the nodes according to the layers' reverse sequential ordering, wherein each of the nodes completes the backward-propagation of all micro-batches through a given layer prior to performing backward-propagation through any layer that precedes the given layer in the sequential ordering.