Julian Wälde and Alexander Klink reported a flaw in the hash function used in the implementation of the Python dictionaries (associative arrays). A specially-crafted set of keys could trigger hash function collisions, which degrade dictionary performance by changing hash table operations complexity from an expected/average O(1) to the worst case O(n). Reporters were able to find colliding strings efficiently using meet in the middle attack. As various web application frameworks for Python automatically pre-fill certain dictionaries with data from the HTTP request (such as GET or POST parameters) for Python web application, a remote attacker could use this flaw to make Python interpreter use excessive amount of CPU time by sending a POST request with large amount of parameters which hash to the same value. This problem is similar to the issue that was previously reported for and fixed in e.g. perl: http://www.cs.rice.edu/~scrosby/hash/CrosbyWallach_UsenixSec2003.pdf
Acknowledgements: Red Hat would like to thank oCERT for reporting this issue. oCERT acknowledges Julian Wälde and Alexander Klink as the original reporters.
This issue was presented on 28C3: http://events.ccc.de/congress/2011/Fahrplan/events/4680.en.html Details were posted to full-disclosure: http://seclists.org/fulldisclosure/2011/Dec/477 oCERT advisory: http://www.ocert.org/advisories/ocert-2011-003.html n.runs advisory (copy of the full-disclosure post): http://www.nruns.com/_downloads/advisory28122011.pdf 28C3 slides and recording: http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf http://www.youtube.com/28c3#p/u/22/R2Cq3CLI6H8 Another good write-up of the issue: http://cryptanalysis.eu/blog/2011/12/28/effective-dos-attacks-against-web-application-plattforms-hashdos/
New and old threads on python-dev related to this issue: http://mail.python.org/pipermail/python-dev/2011-December/115116.html http://mail.python.org/pipermail/python-dev/2003-May/035874.html
(In reply to comment #10) > http://mail.python.org/pipermail/python-dev/2011-December/115116.html > http://mail.python.org/pipermail/python-dev/2003-May/035874.html Same discussions in gmane archive, which is better at showing discussions that span over more than one month: http://thread.gmane.org/gmane.comp.python.devel/128267 http://thread.gmane.org/gmane.comp.python.devel/50762
(this is being tracked upstream as http://bugs.python.org/issue13703 )
python-dev status update: http://comments.gmane.org/gmane.comp.python.devel/128617
The collision-counting approach is now favored upstream (over hash randomization): http://mail.python.org/pipermail/python-dev/2012-January/115690.html [+1 from Guido here: http://mail.python.org/pipermail/python-dev/2012-January/115692.html ]
For reference, latest version of lemburg's patch using collision-counting approach is currently this one: http://bugs.python.org/file24151/hash-attack.patch (from 2012-01-06)
Upstream seems to be dithering between collision-counting and hash randomization, or some combination, and the discussion seems to have grown out-of-control. For my part, I've posted some patches upstream: (A) "hash-collision-counting-dmalcolm-2012-01-20-001.patch" http://bugs.python.org/file24286 (see http://bugs.python.org/issue13703#msg151707 ) in which I further developed lemburg's first collision-counting patch I then had a brainstorm and tried another approach: (B) "amortized-probe-counting-dmalcolm-2012-01-20-002.patch" http://bugs.python.org/file24288 (see http://bugs.python.org/issue13703#msg151714 ) (C) "amortized-probe-counting-dmalcolm-2012-01-21-003.patch" http://bugs.python.org/file24289 (see http://bugs.python.org/issue13703#msg151735 ) tracking per-dict statistics, in particular: the amortized average cost per modification of each dict, and raising an exception when the average exceeds a given threshold for a given dict. This seems to work for all of python's test suite, and very quickly detects and complains about hostile input. It's rather experimental though, and the ratio/scale-factor is a number I chose arbitrarily, and I haven't gone through the math or gathered real numbers on real code. There are a few issues that would need fixing (see Antoine Pitrou's comments on it). However, there's an argument that any approach that can lead to dict (or set) raising an exception is flawed: see http://bugs.python.org/issue13703#msg151796 It strikes me that this "protection" may allow yet another new class of attack, it which the attacker slowly poisons inputs, so that when a dict is constructed at some later point, an exception is raised you weren't expecting. If the protection wasn't there you'd get the extreme slowdown, rather an exception. But: my patch is too good :) it's too efficient at detecting when it's under attack: the exception is raised too quickly, and so arguably we've made the job easier for the attacker. But maybe I'm overthinking this? Perhaps I just need to tune things? (D) http://bugs.python.org/issue13703#msg151847 in which I backported haypo's hash randomization patch from the main branch to the 2.7 branch. This patch disables the hash randomization by default, and only does it for string-like types, not numeric types (for reasons noted in that comment). When enabled with "PYTHONHASHRANDOMIZATION=1 python", this patch successfully blocks the reproducer. (there's also another hash randomization patch beyond haypo's, which does a few things a little differently; see python issue 13704). Note that hash randomization is known to break Django's test suite, which is why I'm wary about enabling it by default.
I came up with a hybrid approach, which I believe combines the best aspects of the collision-counting and hash-randomization approaches. I've posted it for upstream review here: http://bugs.python.org/issue13703#msg151939 In this approach, by default, hash() and dict don't use hash randomization - preserving existing behavior. It detects when a given dict is under attack, and transitions just that dict to use randomized hashes. It preserves ABI - however, in order to do this, the randomization only happens for str/unicode/bytes keys. I believe this is a reasonable compromise, given the various attack scenarios on a web server (e.g. it should cover HTTP headers, POST fields, JSON, XML-RPC). It will probably need performance tuning and extensive testing.
Upstream has reached a consensus about a security release (adding hash-randomization in disabled form with an envvar to enable it): http://mail.python.org/pipermail/python-dev/2012-January/115892.html I've been volunteered by the release managers of those upstream branches to create the patches (based on the work I've already done; see (D) above: http://bugs.python.org/issue13703#msg151847 ).
Added CVE as per http://www.openwall.com/lists/oss-security/2012/03/10/3
Created python tracking bugs for this issue Affects: fedora-all [bug 802209]
Created python3 tracking bugs for this issue Affects: fedora-all [bug 802210]
python-2.7.3-3.fc17, python-docs-2.7.3-1.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report.
python3-3.2.3-1.fc15 has been pushed to the Fedora 15 stable repository. If problems still persist, please make note of it in this bug report.
This issue was fixed upstream in versions 2.6.8, 2.7.3, 3.1.5 and 3.2.3: http://python.org/download/releases/2.6.8/ http://python.org/download/releases/2.7.3/ http://python.org/download/releases/3.1.5/ http://python.org/download/releases/3.2.3/ The final upstream solution is to use hash randomization. However, randomization is disabled by default to avoid breaking code that incorrectly relies on specific ordering of dictionary keys. Updated python versions provide following ways to enable randomization: - PYTHONHASHSEED environment variable, which is expected to be set to a decimal number in the range [0,4294967295], or "random", with the following meaning: - "random" - new random seed is generated during interpreter start, seed is not predictable across python interpreter invocations - 0 - use the same seed as was used by older python versions, effectively disabling randomization - other values - use specific fixed seed value, which is different from seed that was used in older python versions; this may be useful in test suites or environments when multiple python instances are expected to produce the same hash values - -R command line option, its use is equivalent to setting PYTHONHASHSEED environment variable to "random" Refer to python command manual page for further details: http://docs.python.org/release/2.6.8/using/cmdline.html#cmdoption-R http://docs.python.org/release/2.6.8/using/cmdline.html#envvar-PYTHONHASHSEED
python-2.7.3-1.fc16, python-docs-2.7.3-1.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report.
python3-3.2.3-5.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report.
python26-2.6.8-1.el5 has been pushed to the Fedora EPEL 5 stable repository. If problems still persist, please make note of it in this bug report.
As part of this fix, an implementation of os.random was moved from the os module (part of the Python standard library) the posix module (built in to the Python executable binary). This causes no issues for most use cases, but may cause problems where updated Python standard library is used with not updated Python executable. Such problem was reported for Python virtualenv use cases, see: http://bugs.python.org/issue14444#msg158172 The above link also provides instructions on how to fix the problem. These instructions are also available on the virtualenv project news page: http://www.virtualenv.org/en/latest/news.html Warning: Python bugfix releases 2.6.8, 2.7.3, 3.1.5 and 3.2.3 include a change that will cause “import random” to fail with “cannot import name urandom” on any virtualenv created on a Unix host with an earlier release of Python 2.6/2.7/3.1/3.2, if the underlying system Python is upgraded. This is due to the fact that a virtualenv uses the system Python’s standard library but contains its own copy of the Python interpreter, so an upgrade to the system Python results in a mismatch between the version of the Python interpreter and the version of the standard library. It can be fixed by removing $ENV/bin/python and re-running virtualenv on the same target directory with the upgraded Python.
This issue has been addressed in following products: Red Hat Enterprise Linux 6 Via RHSA-2012:0744 https://rhn.redhat.com/errata/RHSA-2012-0744.html
This issue has been addressed in following products: Red Hat Enterprise Linux 5 Via RHSA-2012:0745 https://rhn.redhat.com/errata/RHSA-2012-0745.html
python3-3.2.3-2.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report.
CVE-2013-7040 is filed as a followup for this bug.