ALIGNMENT OF NEURAL NETWORKS USING ARCHITECTURAL MODIFICATIONS AND TRAINING EXAMPLES

Application US20260093990A1 Kind: A1 Apr 02, 2026

Inventors

Xiangyu Qi, Xiao Ma, Ahmad Beirami

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium for aligning the output of a pre-trained generative neural network. In one aspect, the pre-trained generative neural network is adapted by introducing one or more filter layers. Each filter layer processes a filter layer input comprising an output from a stack of the pre-trained neural network layers, in accordance with trainable parameters of the filter layer, to generate a filter layer output. A next neural network layer after the stack of pre-trained neural network layers is configured to process at least the filter layer output. The trainable parameters of the filter layer(s) are adjusted using a training objective to increase the likelihood of the adapted neural network generating aligned responses to a plurality of training requests, whilst keeping pre-trained trainable parameters of the pre-trained neural network layers fixed.

CPC Classifications

G06N 3/082 G06N 3/084

Filing Date

2025-10-01

Application No.

19346875

View original document →