End-to-end segmentation in a two-pass cascaded encoder automatic speech recognition model

Grant US12586579B2 Kind: B2 Mar 24, 2026

Assignee

Google LLC

Inventors

Wenqian Ronny Huang, Shuo-yiin Chang, Tara N. Sainath, Yanzhang He

Abstract

A unified end-to-end segmenter and two-pass automatic speech recognition (ASR) model includes a first encoder, a first decoder, a second encoder, and a second decoder. The first encoder is configured to receive a sequence of acoustic frames and generate a first higher order feature representation. The first decoder is configured to receive the first higher order feature representation and generate, at each of a plurality of output steps, a first probability distribution and an indication of whether the output step corresponds to an end of speech segment, and emit an end of speech timestamp. The second encoder is configured to receive the first higher order feature representation and the end of speech timestamp, and generate a second higher order feature representation. The second decoder is configured to receive the second higher order feature representation and generate a second probability distribution.

CPC Classifications

G10L 15/063 G10L 15/16 G10L 15/22 G10L 15/05 G10L 15/02 G10L 15/32 G10L 2015/0631 G10L 15/197 G10L 2015/025 G10L 15/28 G10L 15/30 G10L 15/19 G10L 15/167 G10L 15/183 G10L 15/26 G06N 20/00 G06N 5/00

Filing Date

2023-11-17

Application No.

18512110

Claims

View original document →