Bug 2416278 (CVE-2025-62426)

Summary:	CVE-2025-62426 vllm: vLLM vulnerable to DoS via large Chat Completion or Tokenization requests with specially crafted `chat_template_kwargs`
Product:	[Other] Security Response	Reporter:	OSIDB Bzimport <bzimport>
Component:	vulnerability	Assignee:	Product Security DevOps Team <prodsec-dev>
Status:	NEW ---	QA Contact:
Severity:	medium	Docs Contact:
Priority:	medium
Version:	unspecified	CC:	alinfoot, bbrownin, dtrifiro, jkoehler, lphiri, rbryant, weaton
Target Milestone:	---	Keywords:	Security
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	---
Doc Text:	A vulnerability in vLLM allows an authenticated user to trigger unintended tokenization during chat template processing by supplying crafted chat_template_kwargs to the /v1/chat/completions or /tokenize endpoints. By forcing the server to tokenize very large inputs, an attacker can block the API server’s event loop for extended periods, causing a denial of service and delaying all other requests.	Story Points:	---
Clone Of:		Environment:
Last Closed:		Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description OSIDB Bzimport 2025-11-21 02:01:13 UTC

vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to before 0.11.1, the /v1/chat/completions and /tokenize endpoints allow a chat_template_kwargs request parameter that is used in the code before it is properly validated against the chat template. With the right chat_template_kwargs parameters, it is possible to block processing of the API server for long periods of time, delaying all other requests. This issue has been patched in version 0.11.1.