Red Hat Bugzilla – Bug 263261
CVE-2007-4559 python tarfile module directory traversal
Last modified: 2010-12-22 17:25:48 EST
Common Vulnerabilities and Exposures assigned an identifier CVE-2007-4559 to the following vulnerability: Directory traversal vulnerability in the (1) extract and (2) extractall functions in the tarfile module in Python allows user-assisted remote attackers to overwrite arbitrary files via a .. (dot dot) sequence in filenames in a TAR archive, a related issue to CVE-2001-1267. References: Issue and additional attack vectors were discussed in following thread on python-dev mailinglist: http://mail.python.org/pipermail/python-dev/2007-August/074290.html Upstream bug tracking possible fixes for the issue: http://bugs.python.org/issue1044
Ok, so they seem confused about whether they wanted to fix anything or just define what it currently does as "correct". Also the patches they are proposing only "fix" paths in a tarfile prefixed with "../" or "/" ... they are trying to fix the symlink attacks by just checking the result of the link (which is very different from what GNUtar does). And I'm not sure that's all of the known tar attacks, is it? Summary: . symlinks linking to just ".." work (can be used repeatedly to walk up the tree). . symlinks pointing to absolute paths can't be used. . path checking doesn't check for either "./../foo" or "xyz/../../foo" type attacks. . failure against the security checks results in an exception being thrown. . I'm pretty sure self._check_path(os.path.join(tarinfo.name, tarinfo.linkname)) is wrong, I think they meant "current path inside the tarfile" not "path of symlink" ... so that "foo.html -> ../src/foo.html" works if the link is in a docs directory.
James, thanks for feedback. I've re-checked list of directory traversal vulnerabilities reported against tar in past few years. Not very surprisingly, reported attack vectors use either absolute paths or paths with '..'s. Such path can be used in file/directory name or in (directory) symlink. Additionally, there were some issues specific to implementation of the checks ('/../' detected correctly, but not '//../'). . symlinks linking to just ".." work (can be used repeatedly to walk up the tree). Good point. You seem to be right with that. . symlinks pointing to absolute paths can't be used. All "suspicious" symlinks are ignored / not extracted. . path checking doesn't check for either "./../foo" or "xyz/../../foo" type attacks. It does. Patch is using normpath() to normalize paths, hence both of your example paths get normalized to: ../foo and are tested after normalization. . I'm pretty sure self._check_path(os.path.join(tarinfo.name, tarinfo.linkname)) is wrong, I think they meant "current path inside the tarfile" not "path of symlink" ... so that "foo.html -> ../src/foo.html" works if the link is in a docs directory. Yes, that part seems to be incorrect... I assume tarinfo.name is dir/in/tarfile/symlink. So in your example tarinfo.name would be docs/foo.html and join + normpath would yield docs/src/foo.html ... not correct. Current upstream opinion seems to be that module's behavior conforms to standards and it just should not be used to extract archives from untrusted sources.
> . symlinks pointing to absolute paths can't be used. > All "suspicious" symlinks are ignored / not extracted. Right, it's just I could see it being "common" to have symlinks to /home/* or /var/*. So people might treat that as a regression. But, yeh, it's also possible people have a use for tarfiles with symlinks to ../foo in them. > . path checking doesn't check for either "./../foo" or "xyz/../../foo" type attacks. > It does. Patch is using normpath() Ahh, my bad, I don't see how that can do the right thing in the general case but it'll be secure :). > Current upstream opinion seems to be that module's behavior conforms to standards and it just should not be used to extract archives from untrusted sources. Fair enough, I'm not sure we should care anymore than upstream then (although maybe maybe more specifically say don't use the module with any tarfiles you haven't created).
Upstream bug report resolved with following update to documentation: Never extract archives from untrusted sources without prior inspection. It is possible that files are created outside of 'path', e.g. members that have absolute filenames starting with "/" or filenames with two dots "..". There probably not much we can do without diverging significantly from upstream version.
Upstream has resolved this issue (http://bugs.python.org/issue1044#msg55464): "After careful consideration and a private discussion with Martin I do no longer think that we have a security issue here. tarfile.py does nothing wrong, its behaviour conforms to the pax definition and pathname resolution guidelines in POSIX. There is no known or possible practical exploit. I update the documentation with a warning, that it might be dangerous to extract archives from untrusted sources. That is the only thing to be done IMO." And the documentation "fix": http://svn.python.org/view/python/trunk/Doc/library/tarfile.rst?r1=57764&r2=57763&pathrev=57764 If upstream does not feel this is a security issue, as stated in comment #4, neither should we.