Description of problem:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xc0 in position 4: invalid start byte [
Version-Release number of selected component (if applicable):
Steps to Reproduce:
The steps to reproduce are mentioned in the "Notes" section here:
I originally thought it was a bug in docker creating a malformed tarball or docker-py making a bad call, but neither panned out and this code works in Python3 so it looks as though it's a python issue.
Traceback (most recent call last):
File "./dis", line 269, in <module>
create_chroot(working_dir, from_chroot, image_layers)
File "./dis", line 145, in create_chroot
File "/usr/lib64/python2.7/tarfile.py", line 2041, in extractall
for tarinfo in members:
File "/usr/lib64/python2.7/tarfile.py", line 2471, in next
tarinfo = self.tarfile.next()
File "/usr/lib64/python2.7/tarfile.py", line 2319, in next
tarinfo = self.tarinfo.fromtarfile(self)
File "/usr/lib64/python2.7/tarfile.py", line 1242, in fromtarfile
File "/usr/lib64/python2.7/tarfile.py", line 1264, in _proc_member
File "/usr/lib64/python2.7/tarfile.py", line 1394, in _proc_pax
value = value.decode("utf8")
File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xc0 in position 4: invalid start byte
Not a traceback
For reference, the centos7:httpd image used to reproduce this is built from the Dockerfile found here:
So it seems that this fails on utf-8 decoding a PAX header value, which is not utf-8. Docker registry has a test that shows what this may look like .
Python 2 just assumes that PAX headers are utf-8 and fails on decoding error, while Python 3, by default, uses "surrogateescape" , which remembers undecodable characters as they are and encodes them properly on subsequent str.encode() call. We'll try to see if we can mimic this behaviour in Python 2, but I can't promise this is actually doable.
Note for self: it seems that docker registry (that runs on Python 2) has a test that expects this failure .
For now, an obvious workaround is using python33 from RHSCL.
Additional info: the PAX header that causes this is called "SCHILY.xattr.security.capability".
More info: Docker registry actually approaches that a bit differently - by monkeypatching _proc_pax method of tarfile - and there's a whole bug opened around that . I guess we could provide a monkeypatch, that would work the same if imported in RHEL 7 Python 2.7. We'll try to see if we have more options and if not, we'll probably go ahead with this.
After rethinking this carefully and discussing with other folks at python-maint, here's the status of this bug:
- As noted above, directly patching tarfile module should not be done to preserve backwards compat and avoid any possible regressions.
- Adding a distro specific monkeypatching module into Python's stdlib for this would not be wise. We want to stay as close to vanilla Python as possible and we should not add a divergent patch (one that would only work on RHEL).
There are two systematic ways to solve this that we can see:
1) We would like to encourage everyone to move out of system Python and use RHSCL if at all possible. RHSCL has Python 3.3 which would solve your problem as noted in comment 3. Adam, would using RHSCL work for you? If not, could you specify why? Assuming you can use it, I'll close this bug as wontfix.
2) We could theoretically create a standalone package with the monkeypatch and upload it to PyPI, then we'd package it as RPM for (Fedora and) RHEL. The advantage would be that runtime environment of your package would also be reproducible on Python 2 on other platforms, simply by installing this package. Furthermore, it'd be available for other people with similar issues to use.
I consider 2) to be a secondary solution and would very much prefer to solve this by using RHSCL as noted in 1). Adam, is this possible for you? Thanks.
I'm seeing the same issue here: https://github.com/goldmann/docker-scripts/issues/13
I've hit this too and created upstream issue: http://bugs.python.org/issue26740
I believe we've hit this issue in https://bugzilla.redhat.com/show_bug.cgi?id=1451697