Changeflow GovPing Telecom & Technology USPTO Grants Reinforcement Learning Patent to G...
Routine Notice Added Final

USPTO Grants Reinforcement Learning Patent to Google LLC

Favicon for changeflow.com ChangeBridge: Patent Grants - AI & Computing (G06N)
Published March 24th, 2026
Detected March 25th, 2026
Email

Summary

The USPTO has granted a patent (US12585917B2) to Google LLC for reinforcement learning using advantage estimates. The patent covers methods and systems for computing Q values in continuous action spaces, potentially impacting AI development and deployment.

What changed

The United States Patent and Trademark Office (USPTO) has granted patent US12585917B2 to Google LLC, titled "Reinforcement learning using advantage estimates." The patent, filed on March 25, 2022, and granted on March 24, 2026, details methods and systems for computing Q values for agents interacting with environments from continuous action spaces. Key aspects include value subnetworks, policy subnetworks, and subsystems for generating advantage estimates and Q values.

This patent grant is a routine event for a major technology firm and does not impose new regulatory obligations on other entities. However, it signifies an advancement in AI and machine learning technology, specifically in reinforcement learning. Companies operating in the AI development space, particularly those utilizing or researching reinforcement learning techniques, may find the abstract and CPC classifications relevant for understanding the competitive landscape and potential intellectual property considerations.

Source document (simplified)

← USPTO Patent Grants

Reinforcement learning using advantage estimates

Grant US12585917B2 Kind: B2 Mar 24, 2026

Assignee

Google LLC

Inventors

Shixiang Gu, Timothy Paul Lillicrap, Ilya Sutskever, Sergey Vladimir Levine

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for computing Q values for actions to be performed by an agent interacting with an environment from a continuous action space of actions. In one aspect, a system includes a value subnetwork configured to receive an observation characterizing a current state of the environment and process the observation to generate a value estimate; a policy subnetwork configured to receive the observation and process the observation to generate an ideal point in the continuous action space; and a subsystem configured to receive a particular point in the continuous action space representing a particular action; generate an advantage estimate for the particular action; and generate a Q value for the particular action that is an estimate of an expected return resulting from the agent performing the particular action when the environment is in the current state.

CPC Classifications

G06N 3/0427 G06N 3/08 G06N 3/042 G06N 3/092 G06N 3/0464 G06N 3/04 G06N 3/045 G06N 3/0455 G06N 3/084 G06N 3/09 G06N 3/044 G06N 3/0442 G06N 3/047 G06N 3/0475 G06N 3/088 G06N 3/091 G06N 3/094 G06N 20/00 G06T 2207/20081

Filing Date

2022-03-25

Application No.

17704721

Claims

21

View original document →

Named provisions

Reinforcement learning using advantage estimates

Classification

Agency
USPTO
Published
March 24th, 2026
Instrument
Notice
Legal weight
Non-binding
Stage
Final
Change scope
Minor
Document ID
US12585917B2

Who this affects

Applies to
Technology companies
Industry sector
5112 Software & Technology
Activity scope
AI Development
Geographic scope
United States US

Taxonomy

Primary area
Intellectual Property
Operational domain
IT Security
Topics
Artificial Intelligence Machine Learning

Get Telecom & Technology alerts

Weekly digest. AI-summarized, no noise.

Free. Unsubscribe anytime.

Get alerts for this source

We'll email you when ChangeBridge: Patent Grants - AI & Computing (G06N) publishes new changes.

Optional. Personalizes your daily digest.

Free. Unsubscribe anytime.