Evolutionary thought caching for multi-stage language model systems

Grant US12585882B1 Kind: B1 Mar 24, 2026

Assignee

ATOBEAM TECHNOLOGIES INC.

Inventors

Brian Galvin, Alan McCord

Abstract

A system and method for efficient natural language processing combines large and small language models with a reasoning cache architecture. Input data is processed by a first large language model to generate structured thoughts with associated latent representations, which are cached for future use. Specialized agents perform domain-specific operations on cached thoughts and collaboratively evolve them using genetic algorithms. When new input is received, similar cached or evolved thoughts are retrieved based on latent representation similarity. The input and retrieved thoughts are then routed to a second, smaller language model to generate a response. This architecture reduces computational overhead while preserving response quality, enables reuse of reasoning across sessions and devices, and extends effective context beyond traditional sequence limits. By leveraging prior reasoning, the system minimizes redundant computation and supports scalable deployment across diverse hardware environments.

CPC Classifications

G06F 40/211 G06F 40/253 G06F 40/268 G06F 40/284 G06F 40/30 G06N 3/08

Filing Date

2025-09-05

Application No.

19321168

Claims

View original document →