Bug 1194473
Summary: | UnicodeDecodeError: 'utf8' codec can't decode byte 0xc0 in position 4: invalid start byte [ | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Adam Miller <admiller> |
Component: | python | Assignee: | Python Maintainers <python-maint> |
Status: | CLOSED WONTFIX | QA Contact: | BaseOS QE - Apps <qe-baseos-apps> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.0 | CC: | admiller, bkabrda, bnater, mgoldman, smilner, ttomecek |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-01-11 17:59:20 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Adam Miller
2015-02-19 22:25:43 UTC
For reference, the centos7:httpd image used to reproduce this is built from the Dockerfile found here: https://github.com/CentOS/CentOS-Dockerfiles/tree/master/httpd/centos7 So it seems that this fails on utf-8 decoding a PAX header value, which is not utf-8. Docker registry has a test that shows what this may look like [1]. Python 2 just assumes that PAX headers are utf-8 and fails on decoding error, while Python 3, by default, uses "surrogateescape" [2], which remembers undecodable characters as they are and encodes them properly on subsequent str.encode() call. We'll try to see if we can mimic this behaviour in Python 2, but I can't promise this is actually doable. Note for self: it seems that docker registry (that runs on Python 2) has a test that expects this failure [3]. For now, an obvious workaround is using python33 from RHSCL. [1] https://github.com/docker/docker-registry/blob/09e9e20cd1d3df3f1435e970c50d7d7f775045f6/tests/test_tarfile.py#L78 [2] https://docs.python.org/3.4/library/codecs.html#error-handlers [3] https://github.com/docker/docker-registry/blob/09e9e20cd1d3df3f1435e970c50d7d7f775045f6/tests/test_tarfile.py#L23 Additional info: the PAX header that causes this is called "SCHILY.xattr.security.capability". More info: Docker registry actually approaches that a bit differently - by monkeypatching _proc_pax method of tarfile - and there's a whole bug opened around that [1]. I guess we could provide a monkeypatch, that would work the same if imported in RHEL 7 Python 2.7. We'll try to see if we have more options and if not, we'll probably go ahead with this. [1] https://github.com/docker/docker-registry/pull/381 After rethinking this carefully and discussing with other folks at python-maint, here's the status of this bug: - As noted above, directly patching tarfile module should not be done to preserve backwards compat and avoid any possible regressions. - Adding a distro specific monkeypatching module into Python's stdlib for this would not be wise. We want to stay as close to vanilla Python as possible and we should not add a divergent patch (one that would only work on RHEL). There are two systematic ways to solve this that we can see: 1) We would like to encourage everyone to move out of system Python and use RHSCL if at all possible. RHSCL has Python 3.3 which would solve your problem as noted in comment 3. Adam, would using RHSCL work for you? If not, could you specify why? Assuming you can use it, I'll close this bug as wontfix. 2) We could theoretically create a standalone package with the monkeypatch and upload it to PyPI, then we'd package it as RPM for (Fedora and) RHEL. The advantage would be that runtime environment of your package would also be reproducible on Python 2 on other platforms, simply by installing this package. Furthermore, it'd be available for other people with similar issues to use. I consider 2) to be a secondary solution and would very much prefer to solve this by using RHSCL as noted in 1). Adam, is this possible for you? Thanks. I'm seeing the same issue here: https://github.com/goldmann/docker-scripts/issues/13 I've hit this too and created upstream issue: http://bugs.python.org/issue26740 I believe we've hit this issue in https://bugzilla.redhat.com/show_bug.cgi?id=1451697 |