Bug 2291228 (CVE-2024-5206) - CVE-2024-5206 scikit-learn: Possible sensitive data leak
Summary: CVE-2024-5206 scikit-learn: Possible sensitive data leak
Keywords:
Status: NEW
Alias: CVE-2024-5206
Product: Security Response
Classification: Other
Component: vulnerability
Version: unspecified
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Product Security
QA Contact:
URL:
Whiteboard:
Depends On: 2291229 2291230
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-06-10 21:06 UTC by Marco Benatto
Modified: 2025-04-01 08:29 UTC (History)
16 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)

Description Marco Benatto 2024-06-10 21:06:17 UTC
A sensitive data leakage vulnerability was identified in scikit-learn's TfidfVectorizer, specifically in versions up to and including 1.4.1.post1, which was fixed in version 1.5.0. The vulnerability arises from the unexpected storage of all tokens present in the training data within the `stop_words_` attribute, rather than only storing the subset of tokens required for the TF-IDF technique to function. This behavior leads to the potential leakage of sensitive information, as the `stop_words_` attribute could contain tokens that were meant to be discarded and not stored, such as passwords or keys. The impact of this vulnerability varies based on the nature of the data being processed by the vectorizer.

https://github.com/scikit-learn/scikit-learn/commit/70ca21f106b603b611da73012c9ade7cd8e438b8
https://huntr.com/bounties/14bc0917-a85b-4106-a170-d09d5191517c

Comment 1 Marco Benatto 2024-06-10 21:08:39 UTC
Created python-imbalanced-learn tracking bugs for this issue:

Affects: fedora-all [bug 2291229]


Created python-scikit-learn tracking bugs for this issue:

Affects: fedora-all [bug 2291230]


Note You need to log in before you can comment on or make changes to this bug.