USING CACHED EXPERTS IN MIXTURE OF EXPERTS (MOE)
Inventors
Andrii SKLIAR, Babak EHTESHAMI BEJNORDI, Ties Jehan VAN ROZENDAAL, Marinus Willem VAN BAALEN, Markus NAGEL, Paul Nicholas WHATMOUGH
Abstract
Systems and techniques are described herein for processing tokens. For instance, a method for processing tokens is provided. The method may include processing a token at a router model to generate a recommendation a subset of expert models from a plurality of expert models to use for further processing of the token; selecting a number of expert models to use for the further processing of the token based on the recommendation of the subset of expert models and based on cached expert models of the plurality of expert models stored in a cache memory; and processing the token using the selected number of expert models.
CPC Classifications
Filing Date
2025-01-10
Application No.
19017238