Training Transformers Using Sliceout

Application US20260093985A1 Kind: A1 Apr 02, 2026

Assignee

Cohere Inc.

Inventors

Aidan GOMEZ, Seoyeon YOO

Abstract

A system for training the neural network using dropout with slicing operations preserves the regularization effects of dropout, while speeding up computations and reducing the memory requirements of training the neural network. Instead of randomly dropping weights connected to neurons in a neural network, the system slices contiguous memory segments of weight matrices. For transformer models, the approach first receives input data that consist of a sequence of elements. Based on the input data, input embedding vectors with positional encoding are generated. Then the transformer model is trained by passing the input embedding vectors through various neural network layers. While passing through linear layers, some of the weight matrices are sliced (e.g., masked) such that a contiguous section of a weight matrix is kept unsliced and used for training and the rest of the weight matrix is not accessed.

CPC Classifications

G06N 3/08 G06F 16/90335

Filing Date

2025-12-08

Application No.

19412214

View original document →