Bug 2416278 (CVE-2025-62426)

Summary: CVE-2025-62426 vllm: vLLM vulnerable to DoS via large Chat Completion or Tokenization requests with specially crafted `chat_template_kwargs`
Product: [Other] Security Response Reporter: OSIDB Bzimport <bzimport>
Component: vulnerabilityAssignee: Product Security DevOps Team <prodsec-dev>
Status: NEW --- QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: unspecifiedCC: alinfoot, bbrownin, dtrifiro, jkoehler, lphiri, rbryant, weaton
Target Milestone: ---Keywords: Security
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: ---
Doc Text:
A vulnerability in vLLM allows an authenticated user to trigger unintended tokenization during chat template processing by supplying crafted chat_template_kwargs to the /v1/chat/completions or /tokenize endpoints. By forcing the server to tokenize very large inputs, an attacker can block the API server’s event loop for extended periods, causing a denial of service and delaying all other requests.
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description OSIDB Bzimport 2025-11-21 02:01:13 UTC
vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to before 0.11.1, the /v1/chat/completions and /tokenize endpoints allow a chat_template_kwargs request parameter that is used in the code before it is properly validated against the chat template. With the right chat_template_kwargs parameters, it is possible to block processing of the API server for long periods of time, delaying all other requests. This issue has been patched in version 0.11.1.