Training Transformers Using Sliceout Dropout
Summary
The USPTO published patent application US20260093985A1 filed by Cohere Inc., covering a system and method for training transformer neural networks using 'Sliceout' dropout operations. The method slices contiguous memory segments of weight matrices instead of randomly dropping weights, preserving regularization while reducing computational overhead and memory requirements. Application No. 19412214 was filed December 8, 2025.
What changed
Cohere Inc. has been granted a US patent application (US20260093985A1) for a neural network training system utilizing 'Sliceout' dropout. The system modifies traditional dropout by slicing contiguous memory segments of weight matrices within transformer model linear layers, keeping a portion of the weight matrix accessible while excluding the remainder from memory access. The approach maintains regularization benefits while improving computational efficiency. Inventors include Aidan Gomez and Seoyeon Yoo. The technology applies to training transformers using sequential input data with positional encoding.
Patent applications do not impose compliance obligations on third parties. Technology companies developing neural network training systems may wish to review the published claims for potential licensing implications or to assess whether their own training methodologies intersect with the protected approach. This is informational only and does not create regulatory requirements or deadlines.
Archived snapshot
Apr 2, 2026GovPing captured this document from the original source. If the source has since changed or been removed, this is the text as it existed at that time.
Training Transformers Using Sliceout
Application US20260093985A1 Kind: A1 Apr 02, 2026
Assignee
Cohere Inc.
Inventors
Aidan GOMEZ, Seoyeon YOO
Abstract
A system for training the neural network using dropout with slicing operations preserves the regularization effects of dropout, while speeding up computations and reducing the memory requirements of training the neural network. Instead of randomly dropping weights connected to neurons in a neural network, the system slices contiguous memory segments of weight matrices. For transformer models, the approach first receives input data that consist of a sequence of elements. Based on the input data, input embedding vectors with positional encoding are generated. Then the transformer model is trained by passing the input embedding vectors through various neural network layers. While passing through linear layers, some of the weight matrices are sliced (e.g., masked) such that a contiguous section of a weight matrix is kept unsliced and used for training and the rest of the weight matrix is not accessed.
CPC Classifications
G06N 3/08 G06F 16/90335
Filing Date
2025-12-08
Application No.
19412214
Related changes
Get daily alerts for USPTO Patent Applications - AI & Computing (G06N)
Daily digest delivered to your inbox.
Free. Unsubscribe anytime.
Source
About this page
Every important government, regulator, and court update from around the world. One place. Real-time. Free. Our mission
Source document text, dates, docket IDs, and authority are extracted directly from USPTO.
The plain-English summary, classification, and "what to do next" steps are AI-generated from the original text. Cite the source document, not the AI analysis.
Classification
Who this affects
Taxonomy
Browse Categories
Get alerts for this source
We'll email you when USPTO Patent Applications - AI & Computing (G06N) publishes new changes.