EPO Patent EP4711981A1 for AI Model Decoding by Google LLC
Summary
The European Patent Office has published patent application EP4711981A1 for Google LLC, detailing a method for efficient AI model decoding with reduced latency. The patent describes a technique involving adapter layers to speed up speculative decoding and token verification processes.
What changed
The European Patent Office (EPO) has published patent application EP4711981A1, filed by Google LLC, concerning "EFFICIENT ESTIMATION & VERIFICATION WITH EARLY EXITS." The patent describes a computer-implemented method for performing AI model decoding with reduced latency. This involves modifying a pre-trained sequence processing model to include an adapter layer that processes intermediate representations to predict output tokens, while freezing the main model layers during training. The deployed model uses this adapter layer for speculative decoding and token verification.
This publication is a patent application, not a regulatory rule or guidance. It does not impose any new compliance obligations or deadlines on regulated entities. However, it signifies a development in AI technology that may be relevant for companies involved in AI research, development, and deployment, particularly concerning intellectual property and competitive landscapes in the AI sector.
Source document (simplified)
EFFICIENT ESTIMATION & VERIFICATION WITH EARLY EXITS
Publication EP4711981A1 Kind: A1 Mar 18, 2026
Applicants
GOOGLE LLC
Inventors
SCHUSTER, Tal, KOROTKOV, Ivan, JI, Ziwei, KIM, Seungyeon
Abstract
One example aspect is directed to a computer-implemented method (400) for performing model decoding with reduced latency. The method includes obtaining (402) a pre-trained sequence processing model comprising a plurality of layers. The method includes modifying (404) the sequence processing model to contain an adapter layer (106) that is configured to receive and process an intermediate representation generated by a particular intermediate layer of the plurality of layers to predict an output token. The method includes training (406) the adapter layer while holding the plurality of layers of the sequence processing model frozen. The method includes deploying (408) the sequence processing model for speculative decoding in which the adapter layer, the particular intermediate layer, and the plurality of layers (104) that precede the particular intermediate layer perform speculative token decoding and the plurality of layers (108) that are subsequent to the particular intermediate layer perform token verification.
IPC Classifications
G06N 3/045 20230101AFI20260128BHEP G06N 3/084 20230101ALI20260128BHEP G06N 3/096 20230101ALI20260128BHEP G06N 3/0442 20230101ALN20260128BHEP G06N 3/0464 20230101ALN20260128BHEP G06N 3/0495 20230101ALN20260128BHEP G06N 3/09 20230101ALN20260128BHEP G06N 3/094 20230101ALN20260128BHEP
Designated States
AL, AT, BE, BG, CH, CY, CZ, DE, DK, EE, ES, FI, FR, GB, GR, HR, HU, IE, IS, IT, LI, LT, LU, LV, MC, ME, MK, MT, NL, NO, PL, PT, RO, RS, SE, SI, SK, SM, TR
Named provisions
Related changes
Source
Classification
Who this affects
Taxonomy
Browse Categories
Get Telecom & Technology alerts
Weekly digest. AI-summarized, no noise.
Free. Unsubscribe anytime.
Get alerts for this source
We'll email you when ChangeBridge: EPO Bulletin - AI & Computing (G06N) publishes new changes.