TOKENIZED DATA STREAMING FOR MULTI-MODAL LANGUAGE MODELS
Inventors
Rajath Bellipady Shetty, Niral Lalit Pathak, Ratin Kumar
Abstract
In various examples, a multi-modal language model may be split up and hosted by multiple devices. For example, a modality (e.g., vision, audio) encoder and/or projector of the multi-modal language model (e.g., a vision language model) may be hosted on one device (e.g., an in-vehicle SoC) that encodes raw sensor data into corresponding tokens and streams the tokens to a second device (e.g., an external graphic processing unit (GPU) or artificial intelligence (AI) accelerator) that hosts an inference server and a language model (LM) of the multi-modal language model. The LM may return a response indicating the result(s) of the requested detection task, and the response may be used to take some responsive action (e.g., control one or more operations of an ego-machine).
CPC Classifications
Filing Date
2024-10-03
Application No.
18905193