Bug 2491486 (CVE-2026-54293)

Summary: CVE-2026-54293 nltk: NLTK: Information Disclosure via Path Traversal in `nltk.data.load()`
Product: [Other] Security Response Reporter: OSIDB Bzimport <bzimport>
Component: vulnerabilityAssignee: Product Security <prodsec-ir-bot>
Status: NEW --- QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: unspecifiedCC: anpicker, bparees, dschmidt, erezende, hasun, jfula, jkoehler, jlanda, jowilson, kshier, lphiri, nyancey, ometelka, ptisnovs, simaishi, stcannon, syedriko, teagle, xdharmai, yguenane
Target Milestone: ---Keywords: Security
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: ---
Doc Text:
A flaw was found in NLTK (Natural Language Toolkit). The `nltk.data.load()` function is vulnerable to path traversal when processing specially crafted `nltk:` URLs. An attacker can exploit a decode-after-check flaw, where URL-encoded path separators and traversal segments bypass security checks. This allows the attacker to read arbitrary files from the filesystem, leading to information disclosure.
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description OSIDB Bzimport 2026-06-22 19:01:32 UTC
NLTK (Natural Language Toolkit) is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing. Prior to 3.10.0-rc1, nltk.data.load() in NLTK is vulnerable to path traversal via URL-encoded path separators and traversal segments when using the nltk: URL scheme. The unsafe-path regex check is performed before url2pathname() decodes the %xx sequences (a classic decode-after-check / TOCTOU-style flaw), allowing an attacker to bypass the protection documented in NLTK's SECURITY.md and read arbitrary files from the filesystem. While literal traversal strings such as ../../../etc/passwd are correctly blocked, encoded variants such as %2fetc%2fpasswd, %2e%2e%2f..., and ..%2f..%2f slip past the regex and are subsequently decoded into a real filesystem path. This vulnerability is fixed in 3.10.0-rc1.