TOKENIZED DATA STREAMING FOR MULTI-MODAL LANGUAGE MODELS

Application US20260097781A1 Kind: A1 Apr 09, 2026

Inventors

Rajath Bellipady Shetty, Niral Lalit Pathak, Ratin Kumar

Abstract

In various examples, a multi-modal language model may be split up and hosted by multiple devices. For example, a modality (e.g., vision, audio) encoder and/or projector of the multi-modal language model (e.g., a vision language model) may be hosted on one device (e.g., an in-vehicle SoC) that encodes raw sensor data into corresponding tokens and streams the tokens to a second device (e.g., an external graphic processing unit (GPU) or artificial intelligence (AI) accelerator) that hosts an inference server and a language model (LM) of the multi-modal language model. The LM may return a response indicating the result(s) of the requested detection task, and the response may be used to take some responsive action (e.g., control one or more operations of an ego-machine).

CPC Classifications

B60W 60/001 H04L 9/3213

Filing Date

2024-10-03

Application No.

18905193

View original document →