USPTO Patent Grant: End-to-End Segmentation ASR Model
Summary
The USPTO has granted patent US12586579B2 to Google LLC for an end-to-end segmentation and two-pass cascaded encoder automatic speech recognition (ASR) model. The patent describes a system designed to improve speech recognition accuracy by segmenting audio and processing it through multiple encoder-decoder stages.
What changed
The United States Patent and Trademark Office (USPTO) has granted patent US12586579B2, titled 'End-to-end segmentation in a two-pass cascaded encoder automatic speech recognition model,' to Google LLC. This patent covers a novel ASR model architecture that includes a unified end-to-end segmenter and a two-pass cascaded encoder. The model comprises a first encoder and decoder to generate higher-order feature representations and identify speech segment ends, followed by a second encoder and decoder that utilize these representations and timestamps to generate a final probability distribution for speech recognition.
This patent grant is primarily an intellectual property development and does not impose direct regulatory obligations on businesses. However, companies involved in developing or utilizing speech recognition technology, particularly those in the AI and computing sectors, should be aware of this granted patent. It may impact future product development, licensing strategies, and potential infringement considerations. The filing date for this patent was November 17, 2023, and the grant date is March 24, 2026.
Source document (simplified)
End-to-end segmentation in a two-pass cascaded encoder automatic speech recognition model
Grant US12586579B2 Kind: B2 Mar 24, 2026
Assignee
Google LLC
Inventors
Wenqian Ronny Huang, Shuo-yiin Chang, Tara N. Sainath, Yanzhang He
Abstract
A unified end-to-end segmenter and two-pass automatic speech recognition (ASR) model includes a first encoder, a first decoder, a second encoder, and a second decoder. The first encoder is configured to receive a sequence of acoustic frames and generate a first higher order feature representation. The first decoder is configured to receive the first higher order feature representation and generate, at each of a plurality of output steps, a first probability distribution and an indication of whether the output step corresponds to an end of speech segment, and emit an end of speech timestamp. The second encoder is configured to receive the first higher order feature representation and the end of speech timestamp, and generate a second higher order feature representation. The second decoder is configured to receive the second higher order feature representation and generate a second probability distribution.
CPC Classifications
G10L 15/063 G10L 15/16 G10L 15/22 G10L 15/05 G10L 15/02 G10L 15/32 G10L 2015/0631 G10L 15/197 G10L 2015/025 G10L 15/28 G10L 15/30 G10L 15/19 G10L 15/167 G10L 15/183 G10L 15/26 G06N 20/00 G06N 5/00
Filing Date
2023-11-17
Application No.
18512110
Claims
37
Named provisions
Related changes
Source
Classification
Who this affects
Taxonomy
Browse Categories
Get Telecom & Technology alerts
Weekly digest. AI-summarized, no noise.
Free. Unsubscribe anytime.
Get alerts for this source
We'll email you when ChangeBridge: Patent Grants - AI & Computing (G06N) publishes new changes.