EPO Patent EP4711981A1 for AI Model Decoding by Google LLC

ChangeBridge: EPO Bulletin - AI & Computing (G06N)

Published March 18th, 2026

Detected March 23rd, 2026

Summary

The European Patent Office has published patent application EP4711981A1 for Google LLC, detailing a method for efficient AI model decoding with reduced latency. The patent describes a technique involving adapter layers to speed up speculative decoding and token verification processes.

View original document View source feed page

What changed

The European Patent Office (EPO) has published patent application EP4711981A1, filed by Google LLC, concerning "EFFICIENT ESTIMATION & VERIFICATION WITH EARLY EXITS." The patent describes a computer-implemented method for performing AI model decoding with reduced latency. This involves modifying a pre-trained sequence processing model to include an adapter layer that processes intermediate representations to predict output tokens, while freezing the main model layers during training. The deployed model uses this adapter layer for speculative decoding and token verification.

This publication is a patent application, not a regulatory rule or guidance. It does not impose any new compliance obligations or deadlines on regulated entities. However, it signifies a development in AI technology that may be relevant for companies involved in AI research, development, and deployment, particularly concerning intellectual property and competitive landscapes in the AI sector.

Source document (simplified)

← EPO Patent Bulletin

EFFICIENT ESTIMATION & VERIFICATION WITH EARLY EXITS

Publication EP4711981A1 Kind: A1 Mar 18, 2026

Applicants

GOOGLE LLC

Inventors

SCHUSTER, Tal, KOROTKOV, Ivan, JI, Ziwei, KIM, Seungyeon

Abstract

One example aspect is directed to a computer-implemented method (400) for performing model decoding with reduced latency. The method includes obtaining (402) a pre-trained sequence processing model comprising a plurality of layers. The method includes modifying (404) the sequence processing model to contain an adapter layer (106) that is configured to receive and process an intermediate representation generated by a particular intermediate layer of the plurality of layers to predict an output token. The method includes training (406) the adapter layer while holding the plurality of layers of the sequence processing model frozen. The method includes deploying (408) the sequence processing model for speculative decoding in which the adapter layer, the particular intermediate layer, and the plurality of layers (104) that precede the particular intermediate layer perform speculative token decoding and the plurality of layers (108) that are subsequent to the particular intermediate layer perform token verification.

IPC Classifications

G06N 3/045 20230101AFI20260128BHEP G06N 3/084 20230101ALI20260128BHEP G06N 3/096 20230101ALI20260128BHEP G06N 3/0442 20230101ALN20260128BHEP G06N 3/0464 20230101ALN20260128BHEP G06N 3/0495 20230101ALN20260128BHEP G06N 3/09 20230101ALN20260128BHEP G06N 3/094 20230101ALN20260128BHEP

Designated States

AL, AT, BE, BG, CH, CY, CZ, DE, DK, EE, ES, FI, FR, GB, GR, HR, HU, IE, IS, IT, LI, LT, LU, LV, MC, ME, MK, MT, NL, NO, PL, PT, RO, RS, SE, SI, SK, SM, TR

View original document →