Cache techniques for large language model processing

Grant US12579974B1 Kind: B1 Mar 17, 2026

Assignee

Amazon Technologies, Inc.

Inventors

Sixing Lu, Xiaocheng Deng, Yicheng Wang, Chengyuan Ma, Gang Chen

Abstract

Techniques for cache management for LLM processing are described. Example embodiments include a signal hashing model that generates a key for particular context data. An LLM output corresponding to the context data is stored in a cache along with the key. For a user input received by the system, a cache lookup is performed using a key for context data corresponding to the received user input. For a cache hit, the stored output is used to respond to the user input. For a cache miss, a LLM processes the context data and the user input to generate an output within a first timeout. If the LLM is unable to generate an output within the first timeout, then in some cases, the LLM is allowed to continue processing until a second timeout, and a final or partial output from the LLM is stored in the cache.

CPC Classifications

G06F 11/3447 G06F 16/3329 G06F 16/9536 G06F 40/242 G06F 40/274 G06F 40/295 G06F 40/35 G06F 16/345 G06F 16/90332 G06F 40/237 G06N 3/084 G06N 3/0895 G06N 20/00 G10L 15/1815 G10L 15/183 G10L 15/22 G10L 15/32 G10L 15/02 G10L 15/197 G10L 15/26 H04M 3/4936 G06Q 30/02 G06Q 50/18 G16H 80/00 H04L 63/10

Filing Date

2023-08-21

Application No.

18452861

Claims

View original document →