End-to-end segmentation in a two-pass cascaded encoder automatic speech recognition model
Assignee
Google LLC
Inventors
Wenqian Ronny Huang, Shuo-yiin Chang, Tara N. Sainath, Yanzhang He
Abstract
A unified end-to-end segmenter and two-pass automatic speech recognition (ASR) model includes a first encoder, a first decoder, a second encoder, and a second decoder. The first encoder is configured to receive a sequence of acoustic frames and generate a first higher order feature representation. The first decoder is configured to receive the first higher order feature representation and generate, at each of a plurality of output steps, a first probability distribution and an indication of whether the output step corresponds to an end of speech segment, and emit an end of speech timestamp. The second encoder is configured to receive the first higher order feature representation and the end of speech timestamp, and generate a second higher order feature representation. The second decoder is configured to receive the second higher order feature representation and generate a second probability distribution.
CPC Classifications
Filing Date
2023-11-17
Application No.
18512110
Claims
37