ATTENTION MECHANISM ADJUSTMENT METHOD BASED ON ATTENTION SCORE AND COMPUTING DEVICE USING THE SAME
Assignee
Industrial Technology Research Institute
Inventors
Yao-Hua Chen, Po-Hung Lin, Chih-Tsun Huang
Abstract
An attention mechanism adjustment method based on attention scores, applicable to Transformer models, is provided. The method includes: for the current Transformer block of the Transformer model, obtaining query matrix, key matrix, and value matrix based on the input sequence; using the self-attention module to generate multiple attention score matrices corresponding to multiple attention heads; before executing the softmax function, performing cross-head column-wise aggregation operation on the attention score matrices to obtain a token importance vector; comparing importance scores with the trained importance score threshold to determine if pruning is needed; executing pruning operations on target tokens that need pruning to obtain pruned attention score matrices; performing softmax function operations on the pruned attention score matrices to obtain a pruned attention probability matrix, where the probability values of the pruned tokens are zero.
CPC Classifications
Filing Date
2024-11-26
Application No.
18961430