Cache techniques for large language model processing
Assignee
Amazon Technologies, Inc.
Inventors
Sixing Lu, Xiaocheng Deng, Yicheng Wang, Chengyuan Ma, Gang Chen
Abstract
Techniques for cache management for LLM processing are described. Example embodiments include a signal hashing model that generates a key for particular context data. An LLM output corresponding to the context data is stored in a cache along with the key. For a user input received by the system, a cache lookup is performed using a key for context data corresponding to the received user input. For a cache hit, the stored output is used to respond to the user input. For a cache miss, a LLM processes the context data and the user input to generate an output within a first timeout. If the LLM is unable to generate an output within the first timeout, then in some cases, the LLM is allowed to continue processing until a second timeout, and a final or partial output from the LLM is stored in the cache.
CPC Classifications
Filing Date
2023-08-21
Application No.
18452861
Claims
20