Bug 1383060 - firewalld does not start hanging in getrandom and gets killed by systemd after timeout
Summary: firewalld does not start hanging in getrandom and gets killed by systemd afte...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: python3
Version: 24
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Tomas Orsava
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-09 12:20 UTC by Albert Flügel
Modified: 2016-11-05 07:19 UTC (History)
10 users (show)

Fixed In Version: python3-3.5.2-3.fc24
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-05 07:19:11 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
strace of firewalld when timeouting in getrandom (9.92 KB, text/plain)
2016-10-09 12:20 UTC, Albert Flügel
no flags Details

Description Albert Flügel 2016-10-09 12:20:31 UTC
Created attachment 1208560 [details]
strace of firewalld when timeouting in getrandom

Description of problem:
firewalld does not start at system boot

Version-Release number of selected component (if applicable):
0.4.3.3-1.fc24

How reproducible:
Install Fedora 24 on a virtual machine (i guess this is the reason why randomness is not acuumulate in sufficient degree during system startup)

Steps to Reproduce:
1. Install Fedora 24
2. Enable firewalld and configure it
3. reboot

Actual results:
firewalld most times does not come up. Checking with a wrapper script preceding firewalld with strace ... shows, that it hangs for notable time (70 and more seconds) in the getrandom system call. When timeout of systemd occurs (90 seconds), what happens most of the system boots, firewalld is killed and startup considered failed.

Expected results:
firewalld is started by systemd and continues running

Additional info:

Comment 1 Albert Flügel 2016-10-09 13:12:54 UTC
Note: This is security-relevant.
When the firewalld is not started, the related iptables rules are not loaded. Unless the admin explicitely checks, whether the daemon is running or looks at the iptables rules, he is not getting aware, that the firewall is not active.

Comment 2 Thomas Woerner 2016-10-10 13:48:35 UTC
This is not in firewalld. There seems to be an issue even before the python interpreter is used.

These is the normal strace output starting with open("/usr/lib64/gconv/gconv-modules.cache", O_RDONLY):

26411 12:25:19.522030 open("/usr/lib64/gconv/gconv-modules.cache", O_RDONLY) = 3 <0.000024>
26411 12:25:19.522088 fstat(3, {st_mode=S_IFREG|0644, st_size=26254, ...}) = 0 <0.000015>
26411 12:25:19.522136 mmap(NULL, 26254, PROT_READ, MAP_SHARED, 3, 0) = 0x7f4dc1e5c000 <0.000019>
26411 12:25:19.522184 close(3)          = 0 <0.000014>
26411 12:25:19.522228 futex(0x7f4dc0e458a8, FUTEX_WAKE_PRIVATE, 2147483647) = 0 <0.000016>
26411 12:25:19.522318 getrandom("\374?\361\245\246{\4\"\27\230\250\0\33\343\224\336t\375\210\331\7T\377\24", 24, 0) = 24 <0.000026>
26411 12:25:19.522394 ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0 <0.000018>
26411 12:25:19.522494 mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4dc1de9000 <0.000021>
26411 12:25:19.524967 mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4dc1da9000 <0.000017>
26411 12:25:19.525128 brk(NULL)         = 0x5637826ec000 <0.000011>
26411 12:25:19.525164 brk(0x56378270d000) = 0x56378270d000 <0.000014>
26411 12:25:19.525369 fstat(0, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 3), ...}) = 0 <0.000014>
26411 12:25:19.525477 readlink("/usr/bin/python3", "python3.5", 4096) = 9 <0.000017>
26411 12:25:19.525538 readlink("/usr/bin/python3.5", 0x7fffd53a2c90, 4096) = -1 EINVAL (Invalid argument) <0.000014>

...

This seems to be a python issue. There was abug about getrandom for python3: #1350123

Comment 3 Thomas Woerner 2016-10-10 13:49:26 UTC
Reassigning to python3

Comment 4 Albert Flügel 2016-10-10 14:30:30 UTC
Frankly i'm a bit undecided how to judge this. Python should not unconditionally call getrandom in blocking mode. If it does, it is not usable for services starting early during system startup. Either use getrandom finally, when really needing it, or set appropriate flags GRND_NONBLOCK and evaluate errno.
However, as far as i have seen, python calls getrandom with the 3rd argument set to 0 (particularly GRND_RANDOM is not set), what means /dev/urandom is used, what should not block. If reading from /dev/urandom blocks, there is sth. wrong in the kernel.
Seems to be an insane combination of several factors causing this problem.

Comment 5 Tomas Orsava 2016-10-10 16:33:00 UTC
I believe that this is the relevant upstream issue: http://bugs.python.org/issue26839

This blocking behaviour of `os.urandom()` (provided that the `getrandom` syscall is available on the system) was added in Python 3.5, but was then reverted in Python 3.5.2 (which is in F25).

Fedora 24 was on Python 3.5 from the start—the reason why this issue popped up only now is that there was a bug in Python 3.5 [0] in detecting the availability of the `getrandom` syscall (whose default behaviour in Python 3.5 is to block) and thus the old method of reading `/dev/urandom` was being used (which does not block). (Un)fortunately I recently fixed that bug [0] and which gave rise to this one.

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1350123


In light of this, it seems we will have to backport the patches from Python 3.5.2 that stop the blocking behaviour.

Comment 6 Tomas Orsava 2016-10-11 11:21:22 UTC
After a discussion on #fedora-python, I've decided to update Python to 3.5.2 in Fedora 24 instead of backporting the patches therefrom.

Comment 7 Vedran Miletić 2016-10-14 15:11:13 UTC
(In reply to Tomas Orsava from comment #6)
> After a discussion on #fedora-python, I've decided to update Python to 3.5.2
> in Fedora 24 instead of backporting the patches therefrom.

I have updated my Fedora 24 installation to Python 3.5.2 from [1] and I can confirm the issue is gone.

[1] http://koji.fedoraproject.org/koji/taskinfo?taskID=16087644

Comment 8 Fedora Update System 2016-10-14 21:22:58 UTC
python3-3.5.2-3.fc24, python3-docs-3.5.2-1.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-cd0636887f

Comment 9 Tomas Orsava 2016-10-17 09:41:09 UTC
(In reply to Vedran Miletić from comment #7)
> (In reply to Tomas Orsava from comment #6)
> > After a discussion on #fedora-python, I've decided to update Python to 3.5.2
> > in Fedora 24 instead of backporting the patches therefrom.
> 
> I have updated my Fedora 24 installation to Python 3.5.2 from [1] and I can
> confirm the issue is gone.
> 
> [1] http://koji.fedoraproject.org/koji/taskinfo?taskID=16087644

That's great! If you can, please indicate so using karma as well:

https://bodhi.fedoraproject.org/updates/python3-docs-3.5.2-1.fc24%20python3-3.5.2-3.fc24

Comment 10 Albert Flügel 2016-10-17 11:50:43 UTC
Looks good. firewalld is coming up within 20 seconds, typically after 10.
No other issues seen after the python upgrade.

Thank you !

Comment 11 Fedora Update System 2016-11-05 07:19:11 UTC
python3-3.5.2-3.fc24, python3-docs-3.5.2-1.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.