Bug 2376770 (CVE-2025-3044)

Summary: CVE-2025-3044 llama-index: MD5 Hash Collision in llama_index
Product: [Other] Security Response Reporter: OSIDB Bzimport <bzimport>
Component: vulnerabilityAssignee: Product Security DevOps Team <prodsec-dev>
Status: NEW --- QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: unspecifiedCC: anpicker, bparees, haoli, hasun, hkataria, jajackso, jcammara, jfula, jmitchel, jneedle, jowilson, jwong, kegrant, koliveir, kshier, mabashia, nyancey, ometelka, pbraun, ptisnovs, shvarugh, simaishi, smcdonal, stcannon, syedriko, teagle, tfister, thavo, ttakamiy, xdharmai, yguenane
Target Milestone: ---Keywords: Security
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: ---
Doc Text:
A hash collision flaw was found in llama_index. The MD5 function is used in the ArxivReader class, and given the weakness in the MD5 hashing algorithm, an attacker can build colliding inputs.
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description OSIDB Bzimport 2025-07-07 10:01:45 UTC
A vulnerability in the ArxivReader class of the run-llama/llama_index repository, versions up to v0.12.22.post1, allows for MD5 hash collisions when generating filenames for downloaded papers. This can lead to data loss as papers with identical titles but different contents may overwrite each other, preventing some papers from being processed for AI model training. The issue is resolved in version 0.12.28.