REMOTE LINK FAILURE MANAGEMENT ENGINE IN AN ARTIFICIAL INTELLIGENCE BACKEND NETWORK SYSTEM
Inventors
Prashant RANJAN
Abstract
Methods, systems, and devices for providing remote link failure management using a remote link failure management engine of an artificial intelligence (AI) backend network system are described. Remote link failure management includes hardware-based techniques associated with AI hardware (e.g., an AI accelerator or AI System on Chip “SoC) where the techniques are employed to address malfunctions or breakdowns in components that facilitate the connectivity and communication between AI hardware and other components. The remote link failure management engine supports detecting, mitigating, and recovering from failures in the ports and links in AI hardware. In particular, remote link failure management can be provided for AI hardware based on an Artificial Intelligence Transport Layer Protocol (ATL). ATL enables adding a health bit in ATL data and ACK packets to exchange local port health status between a Sender device and a Receiver device, where the device is artificial intelligence Network Interface Controller (ANC).
CPC Classifications
Filing Date
2024-09-19
Application No.
18890163