Bug 1404918 - Proposal: force C.UTF-8 when Python 3 is run under the C locale
Proposal: force C.UTF-8 when Python 3 is run under the C locale
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: python3 (Show other bugs)
26
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Charalampos Stratakis
Fedora Extras Quality Assurance
AcceptedFreezeException
: Reopened
Depends On:
Blocks: F26AlphaFreezeException F26FinalFreezeException 1432866
  Show dependency treegraph
 
Reported: 2016-12-15 01:09 EST by Nick Coghlan
Modified: 2018-03-26 11:19 EDT (History)
18 users (show)

See Also:
Fixed In Version: python3-3.6.0-21.fc26 python3-3.6.1-6.fc26 python3-3.6.1-8.fc26
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-06-29 19:29:16 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Proposed patch to the Fedora Python 3.6 binary for F26 (deleted)
2016-12-15 01:09 EST, Nick Coghlan
no flags Details | Diff
Initially proposed patch to the Fedora Python 3.6 binary for F26 (1.51 KB, patch)
2016-12-15 01:11 EST, Nick Coghlan
no flags Details | Diff
Draft implementation with environment based off switch and test cases (12.41 KB, patch)
2016-12-18 00:27 EST, Nick Coghlan
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Python 28180 None None None 2016-12-15 01:30 EST

  None (edit)
Description Nick Coghlan 2016-12-15 01:09:58 EST
When run under the C locale, Python 3 doesn't really work properly on systems where UTF-8 is the correct encoding for interacting with the rest of the system. This is described in detail by Armin Ronacher in the click documentation: http://click.pocoo.org/5/python3/#python-3-surrogate-handling

The attached patch is a proposed change to the system Python that assumes the current process is misconfigured when it detects that "LC_CTYPE" refers to the "C" locale, and in that case prints a warnings to stderr and forces the use of the C.UTF-8 locale instead.

To avoid unintended side effects, it *solely* changes the actual python3.6 command line utility - nothing changes for cases where CPython is used as a dynamically linked library.

Behaviour with the patch:

```
$ LANG=C python -c 'import click; cli = click.command()(lambda:None); cli()'
Python detected LC_CTYPE=C. Setting LC_ALL & LANG to C.UTF-8.
```

Behaviour without the patch:

```
$ LANG=C /usr/bin/python3 -c 'import click; cli = click.command()(lambda:None); cli()'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/ncoghlan/.local/lib/python3.5/site-packages/click/core.py", line 716, in __call__
    return self.main(*args, **kwargs)
  File "/home/ncoghlan/.local/lib/python3.5/site-packages/click/core.py", line 675, in main
    _verify_python3_env()
  File "/home/ncoghlan/.local/lib/python3.5/site-packages/click/_unicodefun.py", line 119, in _verify_python3_env
    'mitigation steps.' + extra)
RuntimeError: Click will abort further execution because Python 3 was configured to use ASCII as encoding for the environment.  Either run this under Python 2 or consult http://click.pocoo.org/python3/ for mitigation steps.

This system supports the C.UTF-8 locale which is recommended.
You might be able to resolve your issue by exporting the
following environment variables:

    export LC_ALL=C.UTF-8
    export LANG=C.UTF-8
```
Comment 1 Nick Coghlan 2016-12-15 01:11 EST
Created attachment 1231981 [details]
Initially proposed patch to the Fedora Python 3.6 binary for F26
Comment 2 Miro Hrončok 2016-12-15 06:22:26 EST
We plan to apply the patch after the current Python 3.6 side-tag is merged.
Comment 3 Nick Coghlan 2016-12-15 06:40:16 EST
Note from the current Python SIG discussion: this needs an environment variable to turn off the second-guessing behaviour.

e.g PYTHONALLOWCLOCALE
Comment 4 Miro Hrončok 2016-12-15 07:04:01 EST
...considering the discussion of course. Didn't have time now to read it all.
Comment 5 Nick Coghlan 2016-12-15 08:52:43 EST
Consolidating the feedback from the mailing list thread [1] so far:

To better cover the runtime embedding cases, add a new warning message inside Py_Initialize that says:

    libpython3 detected LC_CTYPE=C. Some libraries and operating system interfaces may not work correctly.
    Use `PYTHONALLOWCLOCALE=1 LC_CTYPE=C /usr/bin/python3` if debugging this
under /usr/bin/python3.

For the command line interpreter, provide the `PYTHONALLOWCLOCALE` off switch, and adjust the warning message as follows:

    Python detected LC_CTYPE=C, forcing LC_ALL & LANG to C.UTF-8 (set
PYTHONALLOWCLOCALE to disable this behaviour)

And if the environment variable is already set:

    Python detected LC_CTYPE=C, but PYTHONALLOWCLOCALE is set. Some
libraries, applications and operating system interfaces may not work correctly.

[1] https://lists.fedoraproject.org/archives/list/python-devel@lists.fedoraproject.org/thread/NBYPZLLAA7SNHOZ4TYMDTLJIKACLVTUM/
Comment 6 Jan Niklas Hasse 2016-12-15 09:59:45 EST
Why would one want to allow the C locale though? Is there any case where LANG=C works but C.utf-8 doesn't (for Python 3)?
Comment 7 Toshio Ernie Kuratomi 2016-12-15 10:22:05 EST
(In reply to Jan Niklas Hasse from comment #6)
> Why would one want to allow the C locale though? Is there any case where
> LANG=C works but C.utf-8 doesn't (for Python 3)?

The use case is debugging.  These are the three cases we've come up with in the mailing list thread where it would be desirable to use C locale instead of C.utf-8 for debugging:

* I am a software developer and the user is running my software with python-3.6 on a distribution that doesn't patch their Python3.
* I am a software developer and the user is running my softwware on an older version of python-3.x that doesn't have this change.
* I am a software developer and I'm running in production under mod_wsgi but debugging/running unittests/etc using /usr/bin/python3.

In those cases, the production version of the software won't be coercing to a non-ascii-aware locale.  Being able to turn off the coercion when running /usr/bin/python3 is the quickest way to replicate the problems that can be encountered in the production environment.
Comment 8 Nick Coghlan 2016-12-18 00:27 EST
Created attachment 1233034 [details]
Draft implementation with environment based off switch and test cases

The updated PYTHONALLOWCLOCALE patch covers everything discussed both here and in the SIG thread, and also adds a new test case for the behaviour.

The current patch refactors test.support.script_helper slightly, but we should probably just duplicate that code to make the patch easier to maintain and leave any refactoring for the upstream implementation.

Example behaviour:
==========================

$ ./python -c "import sys; print(sys.getfilesystemencoding())"
utf-8

$ LANG=C.UTF-8 ./python -c "import sys; print(sys.getfilesystemencoding())"
utf-8

$ LANG=C ./python -c "import sys; print(sys.getfilesystemencoding())"
Python detected LC_CTYPE=C, forcing LC_ALL & LANG to C.UTF-8 (set PYTHONALLOWCLOCALE to disable this behaviour).
utf-8

$ PYTHONALLOWCLOCALE=1 LANG=C ./python -c "import sys; print(sys.getfilesystemencoding())"
Python detected LC_CTYPE=C, but PYTHONALLOWCLOCALE is set. Some libraries, applications, and operating system interfaces may not work correctly.
Py_Initialize detected LC_CTYPE=C, which limits Unicode compatibility. Some libraries and operating system interfaces may not work correctly. Use `PYTHONALLOWCLOCALE=1 LC_CTYPE=C python3` to configure a similar environment when running Python directly.
ascii
==========================

The reason the library warning also shows up in the last example is that from the library's point of view, that CLI invocation looks exactly the same as any other embedding application with a problematic locale configuration. One possible option to make that case a bit more readable would be to omit the CLI warning for it, and rely solely on the warning from the library.
Comment 9 Nick Coghlan 2016-12-18 00:36:45 EST
I also made an interesting discovery while working on this patch: the Py_Initialize code already includes a call to `setlocale(LC_CTYPE, "")` that never gets reverted (the runtime doesn't even save a reference to the old setting for subsequent restoration).

So that means embedding applications already have to set `LC_ALL`, `LC_CTYPE` or `LANG` in the environment if they want an embedded CPython 3 runtime to pay attention to it - they can't just call `setlocale()` before calling Py_Initialize.
Comment 10 Thomas Spura 2016-12-19 01:27:41 EST
Does this also improve the handling of encoding change from utf-8 to ascii if stdout is not a tty from bug #1397428?
Comment 11 Petr Viktorin 2016-12-19 05:16:23 EST
Thomas, bug 1397428 is for Python 2. In py3 it should be fixed already.
Comment 12 Thomas Spura 2016-12-20 01:01:40 EST
(In reply to Petr Viktorin from comment #11)
> Thomas, bug 1397428 is for Python 2. In py3 it should be fixed already.

OK,thanks for the clarification.
Comment 13 Nick Coghlan 2016-12-27 22:38:16 EST
I've now created an upstream PEP targeting Python 3.7 for this: https://www.python.org/dev/peps/pep-0538/

An updated patch (which tweaks the warning messages a bit and avoids emitting the double warning when PYTHONALLOWCLOCALE is set) is attached to the corresponding upstream issue: http://bugs.python.org/issue28180

Assuming that gets accepted some time in the next few weeks, would it make sense to file a Self-Contained Change Proposal for F26 to cover the backport to Python 3.6?
Comment 14 Charalampos Stratakis 2017-01-02 09:49:47 EST
I suppose that a self contained change is not required for that kind of patch. Currently, if that gets accepted upstream we will backport it in rawhide.
Comment 15 Charalampos Stratakis 2017-02-14 08:50:27 EST
I'm creating the self contained change page.

However when I tried to apply the patch compilation fails with:

/builddir/build/BUILD/Python-3.6.0/Python/pylifecycle.c: In function '_emit_stderr_warning_for_c_locale':
/builddir/build/BUILD/Python-3.6.0/Python/pylifecycle.c:315:9: error: format not a string literal and no format arguments [-Werror=format-security]
         fprintf(stderr, _C_LOCALE_WARNING);
Comment 16 Charalampos Stratakis 2017-02-14 10:17:00 EST
Fixed by changing these two fprintf statements from:

fprintf(stderr, _CLI_C_LOCALE_COERCION_WARNING);

to:

fprintf(stderr, "%s", _CLI_C_LOCALE_COERCION_WARNING);
Comment 17 Charalampos Stratakis 2017-02-23 10:06:31 EST
self contained change proposal:

https://fedoraproject.org/wiki/Changes/python3_c.utf-8_locale
Comment 18 Jan Niklas Hasse 2017-02-23 10:14:52 EST
What happens with the patch when LANG is unset?

> In those cases, the production version of the software won't be coercing to a non-ascii-aware locale.  Being able to turn off the coercion when running /usr/bin/python3 is the quickest way to replicate the problems that can be encountered in the production environment.

Couldn't LANG=C.ascii be used in these cases?
Comment 19 Fedora End Of Life 2017-02-28 05:47:27 EST
This bug appears to have been reported against 'rawhide' during the Fedora 26 development cycle.
Changing version to '26'.
Comment 20 Fedora Blocker Bugs Application 2017-03-10 12:09:39 EST
Proposed as a Freeze Exception for 26-alpha by Fedora user cstratak using the blocker tracking app because:

 As the scheduling didn't work out, I will have to request a freeze exception for https://fedoraproject.org/wiki/Changes/python3_c.utf-8_locale in order to be tested extensively.
Comment 21 Fedora Update System 2017-03-13 14:14:40 EDT
python3-3.6.0-21.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-904a280f5f
Comment 22 Mike Ruckman 2017-03-13 14:53:11 EDT
Discussed in today's Blocker Review meeting. FESCo has declared this an FE: https://meetbot-raw.fedoraproject.org/teams/fesco/fesco.2017-03-10-16.02.html
Comment 23 Fedora Update System 2017-03-13 23:22:22 EDT
python3-3.6.0-21.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-904a280f5f
Comment 24 Fedora Update System 2017-03-14 11:16:27 EDT
autoconf-archive-2016.09.16-3.fc26 python3-3.6.0-21.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-904a280f5f
Comment 25 Fedora Update System 2017-03-15 00:21:17 EDT
autoconf-archive-2016.09.16-3.fc26, python3-3.6.0-21.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-904a280f5f
Comment 26 Fedora Update System 2017-03-16 21:07:38 EDT
autoconf-archive-2016.09.16-3.fc26, python3-3.6.0-21.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.
Comment 27 Nick Coghlan 2017-05-06 10:43:12 EDT
Progress update on the upstream PEP: it's getting pretty close to acceptance, but the latest round of reviews prompted a change to make the behaviour more consistent between the "locale coercion" case and the "explicit locale configuration case".

Specifically, instead of calling Py_SetStandardStreamEncoding from the standalone CLI, the language runtime initialization now automatically uses "surrogateescape" on the standard streams whenever the configured locale is one of the potential coercion target locales (similar to the way the default error handler for the standard streams has been set to "surrogateescape" in the C locale since Python 3.5).

As a non-functional change, the upstream patch has also been refactored to move all of the implementation details into the shared library, with just a couple of private API functions accessed from the standalone CLI implementation.

The changes for both of those relative to the previous patch: https://github.com/ncoghlan/cpython/commit/188e7807b6d9e49377aacbb287c074e5cabf70c5

The bulk of the related specification changes are in https://github.com/python/peps/commit/2fb53e7c1bbb04e1321bca11cc0112aec69f6398 with an important clarification in https://github.com/python/peps/commit/4067701b851031b10a37300375735f8489afb4e6 to note that the handling of sys.stderr isn't change (that continues to use "backslashreplace" as its error handler, regardless of locale)

With the Beta freeze only a week away, it probably makes sense to update the backport ASAP - the only remaining open question upstream is an idea we wouldn't backport to 3.6 anyway (specifically, I've suggested we consider making the private configuration API in the current patch a public API in Python 3.7)
Comment 28 Charalampos Stratakis 2017-05-06 13:29:13 EDT
(In reply to Nick Coghlan from comment #27)
> Progress update on the upstream PEP: it's getting pretty close to
> acceptance, but the latest round of reviews prompted a change to make the
> behaviour more consistent between the "locale coercion" case and the
> "explicit locale configuration case".
> 
> Specifically, instead of calling Py_SetStandardStreamEncoding from the
> standalone CLI, the language runtime initialization now automatically uses
> "surrogateescape" on the standard streams whenever the configured locale is
> one of the potential coercion target locales (similar to the way the default
> error handler for the standard streams has been set to "surrogateescape" in
> the C locale since Python 3.5).
> 
> As a non-functional change, the upstream patch has also been refactored to
> move all of the implementation details into the shared library, with just a
> couple of private API functions accessed from the standalone CLI
> implementation.
> 
> The changes for both of those relative to the previous patch:
> https://github.com/ncoghlan/cpython/commit/
> 188e7807b6d9e49377aacbb287c074e5cabf70c5
> 
> The bulk of the related specification changes are in
> https://github.com/python/peps/commit/
> 2fb53e7c1bbb04e1321bca11cc0112aec69f6398 with an important clarification in
> https://github.com/python/peps/commit/
> 4067701b851031b10a37300375735f8489afb4e6 to note that the handling of
> sys.stderr isn't change (that continues to use "backslashreplace" as its
> error handler, regardless of locale)
> 
> With the Beta freeze only a week away, it probably makes sense to update the
> backport ASAP - the only remaining open question upstream is an idea we
> wouldn't backport to 3.6 anyway (specifically, I've suggested we consider
> making the private configuration API in the current patch a public API in
> Python 3.7)

Thanks for the update Nick. I'll update the backport as soon as possible.
Comment 30 Nick Coghlan 2017-05-08 23:36:13 EDT
Thanks. Based on Inada-san's comments, there's likely to be at least one more notable change in the upstream version: changing the locale coercion to only set LANG and LC_CTYPE without setting the LC_ALL override.

Context for that: https://mail.python.org/pipermail/python-dev/2017-May/147896.html

There are a couple of additional changes being considered upstream for 3.7 (removing the coercion warning in 3.8, exposing the legacy locale detection and coercion as a public CPython API), but neither of those would affect the Fedora 3.6 backport.
Comment 31 Nick Coghlan 2017-05-09 03:23:08 EDT
OK, I've pushed the code update to my sandbox branch that changes the locale coercion to always respect LC_ALL rather than attempting to override it: https://github.com/ncoghlan/cpython/commit/476a78133c94d82e19b89f50036cecd9b4214e7a

If you set LC_ALL=C, CPython won't attempt to change it, but will complain about it.

If you set PYTHONCOERCECLOCALE=0, CPython not only won't change the C locale, but won't complain about it either.

Since the locale coercion now only sets LC_CTYPE & LANG, that means it also respects other explicitly set locale categories (like LC_TIME, LC_CURRENCY and LC_MONETARY).

I'll be pushing the corresponding update to the PEP itself shortly.
Comment 32 Nick Coghlan 2017-05-09 08:26:21 EDT
The published PEP has been updated a new python-dev review thread started: https://mail.python.org/pipermail/python-dev/2017-May/147904.html
Comment 33 Fedora Update System 2017-05-09 14:48:32 EDT
python3-3.6.1-6.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-1e3062e0d6
Comment 34 Fedora Update System 2017-05-11 22:12:09 EDT
python3-3.6.1-6.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-1e3062e0d6
Comment 35 Fedora Update System 2017-05-14 16:18:42 EDT
python3-3.6.1-6.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.
Comment 36 Nick Coghlan 2017-05-28 03:33:18 EDT
I'm happy to report that PEP 538 is now accepted upstream: https://mail.python.org/pipermail/python-dev/2017-May/148035.html

There's one change relative to the behaviour of the downstream patch in the Fedora 26 Beta: upstream CPython will *only* set LC_CTYPE, and leave LANG alone.

The rationale for that final change is here: https://www.python.org/dev/peps/pep-0538/#avoiding-setting-lang-for-utf-8-locale-coercion

Rather than filing a new issue, I figure it makes sense to just set this one back to assigned, and then run it through the update cycle again.
Comment 37 Charalampos Stratakis 2017-06-27 05:29:10 EDT
The latest upstream implementation removes the warning when the locale has been coerced, thus proposing a freeze exception for the new build of python3.
Comment 38 Fedora Blocker Bugs Application 2017-06-27 05:30:09 EDT
Proposed as a Freeze Exception for 26-final by Fedora user cstratak using the blocker tracking app because:

 Update to the latest upstream implementation of PEP 538 (rhbz#1432866) which removed the warning about locale coercion to stderr.
Comment 39 Adam Williamson 2017-06-27 17:53:14 EDT
+1 FE, for me.
Comment 40 Fedora Update System 2017-06-28 06:59:52 EDT
python3-3.6.1-8.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-6b4e8ce90d
Comment 41 Fedora Update System 2017-06-28 15:20:49 EDT
python3-3.6.1-8.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-6b4e8ce90d
Comment 42 Fedora Update System 2017-06-29 19:29:16 EDT
python3-3.6.1-8.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.