Reward Feedback for Learning Control Policies Using Natural Language and Vision Data

ChangeBridge: Patent Grants - AI & Computing (G06N)

Published April 7th, 2026

Detected April 7th, 2026

Summary

Hitachi, Ltd. has been granted US Patent 12596955B2 for systems and methods of providing rewards to machine learning algorithms using natural language task descriptions and vision data. The patented invention receives an image and text-based task description, slices the image into sub-images, and uses an embedding model to generate a relevance distribution for generating rewards. The patent contains 9 claims with a filing date of July 20, 2022.

View original document View source feed page

What changed

Hitachi, Ltd. received US Patent 12596955B2 for a reward feedback system in machine learning. The invention relates to methods for providing rewards to ML algorithms by receiving an image paired with a natural language task description, slicing the image into sub-images, and using an embedding model to generate a relevance distribution between sub-images and the task description for reward generation.

Affected parties including technology companies developing ML-based control systems, robotics manufacturers, and automation firms should review the patent scope to assess licensing needs, freedom-to-operate considerations, and potential competitive implications in the reinforcement learning and robotic control spaces.

What to do next

Monitor patent databases for competing or complementary patents in ML reward systems
Review potential licensing opportunities if developing similar ML control systems
Conduct freedom-to-operate analysis for implementations involving NLP and vision-based ML

Source document (simplified)

← USPTO Patent Grants

Reward feedback for learning control policies using natural language and vision data

Grant US12596955B2 Kind: B2 Apr 07, 2026

Assignee

HITACHI, LTD.

Inventors

Andrew James Walker, Joydeep Acharya

Abstract

Example implementations described herein involve systems and methods for providing a reward to a machine learning algorithm, which can include receiving an image, and a task description defined in text; slicing the image into a plurality of sub-images; executing an embedding model to embed the text of the task description and the sub-images to generate a distribution for the sub-images based on relevance to the task description; and generating the reward from the distribution for the sub-images.