Bug 1410175 - python3 fails to start on a RHEL-7 kernel with Fatal Python error: failed to get random numbers to initialize Python
Summary: python3 fails to start on a RHEL-7 kernel with Fatal Python error: failed to ...
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: python3
Version: rawhide
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
Assignee: Charalampos Stratakis
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: PrioritizedBug
Depends On:
Blocks: 1410187
TreeView+ depends on / blocked
 
Reported: 2017-01-04 16:35 UTC by Kamil Dudka
Modified: 2017-01-12 10:10 UTC (History)
16 users (show)

Fixed In Version: python3-3.6.0-3.fc26 python35-3.5.2-6.fc26 python34-3.4.5-3.fc26 python2-2.7.13-1.fc26
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-01-12 10:10:27 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Kamil Dudka 2017-01-04 16:35:40 UTC
Description of problem:
It is no longer possible to build packages relying on python3 functionality on a RHEL-7 kernel.


Version-Release number of selected component (if applicable):
python3-3.6.0-1.fc26.x86_64
kernel-3.10.0-514.el7.x86_64
mock-1.2.21-1.el7.noarch


How reproducible:
100%


Steps to Reproduce:
1. Boot a RHEL-7 system.

2. Install EPEL-7 repositories:
# yum install -y https://dl.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-8.noarch.rpm

3. Install mock:
# yum install -y mock

4. Setup a user for running mock
# useradd -G mock mockuser
# su - mockuser

5. Install python3 into a Fedora Rawhide mock root:
$ mock -r fedora-rawhide-x86_64 --install python3

6. Attempt to start python3:
$ mock -r fedora-rawhide-x86_64 --shell python3


Actual results:
Fatal Python error: failed to get random numbers to initialize Python


Expected results:
python3 interpreter starts.


Additional info:
This breaks RHEL-7 builders/scanners that handle Fedora Rawhide packages.

Comment 1 Lukas Slebodnik 2017-01-04 16:56:12 UTC
And here is a reason why it failed
(gdb) bt
#0  py_getentropy (raise=<optimized out>, size=24, buffer=<optimized out>) at /usr/src/debug/Python-3.6.0/Python/random.c:105
#1  pyurandom (blocking=<optimized out>, raise=<optimized out>, size=<optimized out>, buffer=<optimized out>) at /usr/src/debug/Python-3.6.0/Python/random.c:399
#2  _PyRandom_Init () at /usr/src/debug/Python-3.6.0/Python/random.c:479
#3  0x00007ffff7a61344 in Py_Main (argc=1, argv=0x555555757010) at /usr/src/debug/Python-3.6.0/Modules/main.c:379
#4  0x0000555555554b59 in main (argc=1, argv=<optimized out>) at /usr/src/debug/Python-3.6.0/Programs/python.c:69
(gdb) p res
$3 = -1
(gdb) p errno
$4 = 38

errno 38 is ENOSYS          38      /* Invalid system call number */

The problem is that py_getentropy was prefered over dev_urandom; which could use
syscall getrandom or fallback to /dev/urandom

(gdb) l 382,410
382     pyurandom(void *buffer, Py_ssize_t size, int blocking, int raise)
383     {
384         if (size < 0) {
385             if (raise) {
386                 PyErr_Format(PyExc_ValueError,
387                              "negative argument not allowed");
388             }
389             return -1;
390         }
391
392         if (size == 0) {
393             return 0;
394         }
395
396     #ifdef MS_WINDOWS
397         return win32_urandom((unsigned char *)buffer, size, raise);
398     #elif defined(PY_GETENTROPY)
399         return py_getentropy(buffer, size, raise);
400     #else
401         return dev_urandom(buffer, size, blocking, raise);
402     #endif

Comment 2 Stephen Gallagher 2017-01-04 17:02:22 UTC
Nominating for consideration as a high-priority bug.

Comment 3 Lukas Slebodnik 2017-01-04 17:05:11 UTC
BTW the workaround is to define own hash seed in env variable

sh# PYTHONHASHSEED=1234 python3.6
Python 3.6.0 (default, Dec 27 2016, 20:50:38) 
[GCC 6.3.1 20161221 (Red Hat 6.3.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.getrandom(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 38] Function not implemented

Comment 4 Lukas Slebodnik 2017-01-04 17:07:52 UTC
BTW i can see the same issue python python3.5 fom the package python35

[root@a38d216e5b76 ~]# uname -a
Linux a38d216e5b76 3.10.0-534.el7.x86_64 #1 SMP Fri Dec 16 08:30:57 EST 2016 x86_64 x86_64 x86_64 GNU/Linux


[root@a38d216e5b76 ~]# cat /etc/os-release 
NAME=Fedora
VERSION="26 (Rawhide)"
ID=fedora
VERSION_ID=26
PRETTY_NAME="Fedora 26 (Rawhide)"
ANSI_COLOR="0;34"
CPE_NAME="cpe:/o:fedoraproject:fedora:26"
HOME_URL="https://fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=rawhide
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=rawhide
PRIVACY_POLICY_URL=https://fedoraproject.org/wiki/Legal:PrivacyPolicy

[root@a38d216e5b76 ~]# python3.5
Fatal Python error: getentropy() failed
Aborted (core dumped)

There is just a different backtrace
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff6c3b1ba in __GI_abort () at abort.c:89
#2  0x00007ffff7a4e18f in Py_FatalError (msg=msg@entry=0x7ffff7ae509e "getentropy() failed") at /usr/src/debug/Python-3.5.2/Python/pylifecycle.c:1385
#3  0x00007ffff7a54457 in py_getentropy (fatal=<optimized out>, size=24, buffer=<optimized out>) at /usr/src/debug/Python-3.5.2/Python/random.c:108
#4  _PyRandom_Init () at /usr/src/debug/Python-3.5.2/Python/random.c:442
#5  0x00007ffff7a6882c in Py_Main (argc=1, argv=0x555555757010) at /usr/src/debug/Python-3.5.2/Modules/main.c:369
#6  0x0000555555554b40 in main (argc=1, argv=<optimized out>) at /usr/src/debug/Python-3.5.2/Programs/python.c:65

Comment 5 Miro Hrončok 2017-01-04 17:08:20 UTC
Quick reproducer with centos and vagrant:

$ vagrant init centos/7
$ vagrant up
$ vagrant ssh
[vagrant@localhost ~]$ sudo yum ...

(Got the same result as Kamil.)

Comment 6 Victor Stinner 2017-01-04 17:11:15 UTC
Hi. I wrote a large part of Python/random.c. Many parts of this file are specific to a few platforms. For example, getentropy() is only expected to be available on OpenBSD and Solaris, but it's not used on Solaris:

/* Issue #25003: Don't use getentropy() on Solaris (available since
 * Solaris 11.3), it is blocking whereas os.urandom() should not block. */
#elif defined(HAVE_GETENTROPY) && !defined(sun)

It seems like getentropy() was recently exposed in the glibc, wrapper to the Linux getrandom() syscall:
https://sourceware.org/bugzilla/show_bug.cgi?id=17252

What is the version of your libc?

You can try this workaround which disables getentropy() on Linux:

diff -r ee1390c9b585 Python/random.c
--- a/Python/random.c   Wed Jan 04 12:02:30 2017 +0100
+++ b/Python/random.c   Wed Jan 04 18:08:28 2017 +0100
@@ -79,7 +79,7 @@ win32_urandom(unsigned char *buffer, Py_
 
 /* Issue #25003: Don't use getentropy() on Solaris (available since
  * Solaris 11.3), it is blocking whereas os.urandom() should not block. */
-#elif defined(HAVE_GETENTROPY) && !defined(sun)
+#elif defined(HAVE_GETENTROPY) && !defined(sun) && !defined(__linux__)
 #define PY_GETENTROPY 1
 
 /* Fill buffer with size pseudo-random bytes generated by getentropy().


The py_getrandom() function is more battle tested: it handles EAGAIN, EINTR, EPERM and ENOSYS errors. So it should work for you.

Note: getrandom() syscall was added to the Linux kernel 3.17, your kernel is older (3.10) and so don't have this new syscall.

Comment 8 Lukas Slebodnik 2017-01-04 17:19:17 UTC
(In reply to Lukas Slebodnik from comment #4)
> BTW i can see the same issue python python3.5 fom the package python35
> 
I filed a different BZ for this https://bugzilla.redhat.com/show_bug.cgi?id=1410187

Comment 9 Victor Stinner 2017-01-04 17:35:28 UTC
I proposed a patch upstream: http://bugs.python.org/issue29157

Python 2.7, 3.5, 3.6 and 3.7 are impacted by the issue if compiled with a recent glibc which has getentropy() but run on Linux < 3.17.

Comment 10 Kamil Dudka 2017-01-04 17:49:23 UTC
(In reply to Victor Stinner from comment #6)
> Note: getrandom() syscall was added to the Linux kernel 3.17, your kernel is
> older (3.10) and so don't have this new syscall.

Despite I used RHEL-7 in the summary and the reproducer, please make sure that the solution will also work on RHEL-6 kernels (based on 2.6.32).  We still have some infrastructure that builds Fedora Rawhide packages on RHEL-6 machines in mock.

Comment 11 Lukas Slebodnik 2017-01-04 17:50:05 UTC
(In reply to Victor Stinner from comment #9)
> I proposed a patch upstream: http://bugs.python.org/issue29157
> 
> Python 2.7, 3.5, 3.6 and 3.7 are impacted by the issue if compiled with a
> recent glibc which has getentropy() but run on Linux < 3.17.

FYI getentropy and getrandom were added in glibc-2.24.90-23.fc26 and build finished on Mon, 12 Dec 2016 23:35:36 UTC

Comment 12 Lukas Slebodnik 2017-01-04 17:56:10 UTC
(In reply to Kamil Dudka from comment #10)
> (In reply to Victor Stinner from comment #6)
> > Note: getrandom() syscall was added to the Linux kernel 3.17, your kernel is
> > older (3.10) and so don't have this new syscall.
> 
> Despite I used RHEL-7 in the summary and the reproducer, please make sure
> that the solution will also work on RHEL-6 kernels (based on 2.6.32).  We
> still have some infrastructure that builds Fedora Rawhide packages on RHEL-6
> machines in mock
rhel7 kernel does not have syscalls getrandom and getentropy therore older kernels cannot have them either. So if patch works with rhel7 kernel then it will work with rhel6 kernel as well.

Comment 13 Lukas Slebodnik 2017-01-04 18:03:19 UTC
(In reply to Victor Stinner from comment #9)
> I proposed a patch upstream: http://bugs.python.org/issue29157
> 
> Python 2.7, 3.5, 3.6 and 3.7 are impacted by the issue if compiled with a
> recent glibc which has getentropy() but run on Linux < 3.17.

Fortunately, python2.7 has not been rebuild with glibc-2.24.90-23.fc26 in rawhide
Do you need a BZ for python2 as well?

Comment 14 Stephen Gallagher 2017-01-04 18:14:23 UTC
(In reply to Lukas Slebodnik from comment #13)
> (In reply to Victor Stinner from comment #9)
> > I proposed a patch upstream: http://bugs.python.org/issue29157
> > 
> > Python 2.7, 3.5, 3.6 and 3.7 are impacted by the issue if compiled with a
> > recent glibc which has getentropy() but run on Linux < 3.17.
> 
> Fortunately, python2.7 has not been rebuild with glibc-2.24.90-23.fc26 in
> rawhide
> Do you need a BZ for python2 as well?

Python 2 will end up rebuilt as part of the mass-rebuild scheduled for February 1st, so this will need to be addressed before then (or before any other python2 build).

Comment 15 Miro Hrončok 2017-01-04 18:22:24 UTC
(In reply to Lukas Slebodnik from comment #13)
> Fortunately, python2.7 has not been rebuild with glibc-2.24.90-23.fc26 in
> rawhide
> Do you need a BZ for python2 as well?

I don't think so.

Comment 16 Miro Hrončok 2017-01-04 18:57:25 UTC
Also affects python34 when rebuilt (tested with scratch build [0])

[0] https://koji.fedoraproject.org/koji/taskinfo?taskID=17163903

Comment 17 Lukas Slebodnik 2017-01-04 20:24:08 UTC
(In reply to Victor Stinner from comment #9)
> I proposed a patch upstream: http://bugs.python.org/issue29157
> 
> Python 2.7, 3.5, 3.6 and 3.7 are impacted by the issue if compiled with a
> recent glibc which has getentropy() but run on Linux < 3.17.

Works well for me and getentropy() is not used before getrandom() is prefered at compile time.

sh# objdump -T /usr/lib64/libpython3.6m.so.1.0 | grep 2.25
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.25  getrandom

Comment 18 Miro Hrončok 2017-01-04 21:49:12 UTC
pypy3 and python33 is not affected.

Comment 19 Miro Hrončok 2017-01-05 14:10:34 UTC
(In reply to Victor Stinner from comment #9)
> Python 2.7, 3.5, 3.6 and 3.7 are impacted by the issue if compiled with a
> recent glibc which has getentropy() but run on Linux < 3.17.

I've juts did a scratch built of Python 2.7 in Koji [0] and it does not seem affected, it starts just fine. May it be that only 2.7.13 will be affected? We have 2.7.12.

[0] https://koji.fedoraproject.org/koji/taskinfo?taskID=17168612

Comment 20 Miro Hrončok 2017-01-06 10:39:13 UTC
I've build this with Victor Stinner's first patch in rawhide.

Keeping this open for other stuff that needs it:

 * python35 (already does not work)
 * python34 (will stop working when rebuilt)
 * python36 (not yen in Fedora, but on review)

Still unknown:

 * python (python2) (does not seem to be affected currently, might be when updated)

Comment 21 Victor Stinner 2017-01-06 23:13:27 UTC
I reworked Python/random.c in the default branch:
* Prefer getrandom() over getentropy()
* handle ENOSYS, EPERM and EINTR errors when calling getentropy()

http://bugs.python.org/issue29157#msg284808

I just backported changes into the 3.6 branch as a single change:
https://hg.python.org/cpython/rev/f8e24a0a1124

I will also backport fix the issues in Python 3.5 and 2.7. Right now, I'm waiting for the CI (buildbots) on the 3.6 branch.

> I've build this with Victor Stinner's first patch in rawhide.

Oh, I didn't propose to include my patch, but well, it's very simple simple and should fix all issues :-)

You will be able to get ride of this patch when rebasing patches of the python3.6 package, when Python 3.6.1 will be release

Comment 22 Charalampos Stratakis 2017-01-09 12:32:29 UTC
Python 2.7.12 will not be updated yet to 2.7.13 as the rename review (for renaming to python2) is still pending so I'd like to address that first.

Comment 23 Victor Stinner 2017-01-09 13:54:06 UTC
I just backported changes into the 3.5 branch as a single change:
https://hg.python.org/cpython/rev/8125d9a8152b

I also pushed a tiny change into 2.7 to support the glibc 2.24, "Don't use getentropy() on Linux":
https://hg.python.org/cpython/rev/13a39142c047

Comment 24 Charalampos Stratakis 2017-01-10 21:26:02 UTC
(In reply to Victor Stinner from comment #23)
> I just backported changes into the 3.5 branch as a single change:
> https://hg.python.org/cpython/rev/8125d9a8152b
> 
> I also pushed a tiny change into 2.7 to support the glibc 2.24, "Don't use
> getentropy() on Linux":
> https://hg.python.org/cpython/rev/13a39142c047

Thanks for addressing this

Comment 25 Charalampos Stratakis 2017-01-10 22:52:15 UTC
python2 and python33 remain to be fixed

Comment 26 Miro Hrončok 2017-01-11 13:29:23 UTC
python33 is not affected

Comment 27 Jan Kurik 2017-01-11 15:55:40 UTC
The bug has been approved as a Fedora Priority Bug: https://fedoraproject.org/wiki/Fedora_Program_Management/Prioritized_bugs_and_issues

Comment 28 Charalampos Stratakis 2017-01-12 10:10:27 UTC
The upstream patches have been backported to all the affected interpreters.

Closing the bug. Please reopen it if you find any more issues related to that.


Note You need to log in before you can comment on or make changes to this bug.