FINE-TUNING MULTI-HEAD NETWORK FROM A SINGLE TRANSFORMER LAYER OF PRE-TRAINED LANGUAGE MODEL
Assignee
Oracle International Corporation
Inventors
Thanh Tien Vu, Tuyen Quang Pham, Omid Mohamad Nezami, Mark Edward Johnson, Thanh Long Duong, Cong Duy Vu Hoang
Abstract
Techniques are provided for customizing or fine-tuning a pre-trained version of a machine-learning model that includes multiple layers and is configured to process audio or textual language input. Each of the multiple layers is configured with a plurality of layer-specific pre-trained parameter values corresponding to a plurality of parameters, and each of the multiple layers is configured to implement multi-head attention. An incomplete subset of the multiple layers is identified for which corresponding layer-specific pre-trained parameter values are to be fine-tuned using a client data set. The machine-learning model is fine-tuned using the client data set to generate an updated version of the machine-learning model, where the layer-specific pre-trained parameter values configured for each layer of one of more of the multiple layers not included in the incomplete subset are frozen during the fine-tuning. Use of the updated version of the machine-learning model is facilitated.
CPC Classifications
Filing Date
2025-11-26
Application No.
19402418