Created attachment 1208560 [details] strace of firewalld when timeouting in getrandom Description of problem: firewalld does not start at system boot Version-Release number of selected component (if applicable): 0.4.3.3-1.fc24 How reproducible: Install Fedora 24 on a virtual machine (i guess this is the reason why randomness is not acuumulate in sufficient degree during system startup) Steps to Reproduce: 1. Install Fedora 24 2. Enable firewalld and configure it 3. reboot Actual results: firewalld most times does not come up. Checking with a wrapper script preceding firewalld with strace ... shows, that it hangs for notable time (70 and more seconds) in the getrandom system call. When timeout of systemd occurs (90 seconds), what happens most of the system boots, firewalld is killed and startup considered failed. Expected results: firewalld is started by systemd and continues running Additional info:
Note: This is security-relevant. When the firewalld is not started, the related iptables rules are not loaded. Unless the admin explicitely checks, whether the daemon is running or looks at the iptables rules, he is not getting aware, that the firewall is not active.
This is not in firewalld. There seems to be an issue even before the python interpreter is used. These is the normal strace output starting with open("/usr/lib64/gconv/gconv-modules.cache", O_RDONLY): 26411 12:25:19.522030 open("/usr/lib64/gconv/gconv-modules.cache", O_RDONLY) = 3 <0.000024> 26411 12:25:19.522088 fstat(3, {st_mode=S_IFREG|0644, st_size=26254, ...}) = 0 <0.000015> 26411 12:25:19.522136 mmap(NULL, 26254, PROT_READ, MAP_SHARED, 3, 0) = 0x7f4dc1e5c000 <0.000019> 26411 12:25:19.522184 close(3) = 0 <0.000014> 26411 12:25:19.522228 futex(0x7f4dc0e458a8, FUTEX_WAKE_PRIVATE, 2147483647) = 0 <0.000016> 26411 12:25:19.522318 getrandom("\374?\361\245\246{\4\"\27\230\250\0\33\343\224\336t\375\210\331\7T\377\24", 24, 0) = 24 <0.000026> 26411 12:25:19.522394 ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0 <0.000018> 26411 12:25:19.522494 mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4dc1de9000 <0.000021> 26411 12:25:19.524967 mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4dc1da9000 <0.000017> 26411 12:25:19.525128 brk(NULL) = 0x5637826ec000 <0.000011> 26411 12:25:19.525164 brk(0x56378270d000) = 0x56378270d000 <0.000014> 26411 12:25:19.525369 fstat(0, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 3), ...}) = 0 <0.000014> 26411 12:25:19.525477 readlink("/usr/bin/python3", "python3.5", 4096) = 9 <0.000017> 26411 12:25:19.525538 readlink("/usr/bin/python3.5", 0x7fffd53a2c90, 4096) = -1 EINVAL (Invalid argument) <0.000014> ... This seems to be a python issue. There was abug about getrandom for python3: #1350123
Reassigning to python3
Frankly i'm a bit undecided how to judge this. Python should not unconditionally call getrandom in blocking mode. If it does, it is not usable for services starting early during system startup. Either use getrandom finally, when really needing it, or set appropriate flags GRND_NONBLOCK and evaluate errno. However, as far as i have seen, python calls getrandom with the 3rd argument set to 0 (particularly GRND_RANDOM is not set), what means /dev/urandom is used, what should not block. If reading from /dev/urandom blocks, there is sth. wrong in the kernel. Seems to be an insane combination of several factors causing this problem.
I believe that this is the relevant upstream issue: http://bugs.python.org/issue26839 This blocking behaviour of `os.urandom()` (provided that the `getrandom` syscall is available on the system) was added in Python 3.5, but was then reverted in Python 3.5.2 (which is in F25). Fedora 24 was on Python 3.5 from the start—the reason why this issue popped up only now is that there was a bug in Python 3.5 [0] in detecting the availability of the `getrandom` syscall (whose default behaviour in Python 3.5 is to block) and thus the old method of reading `/dev/urandom` was being used (which does not block). (Un)fortunately I recently fixed that bug [0] and which gave rise to this one. [0] https://bugzilla.redhat.com/show_bug.cgi?id=1350123 In light of this, it seems we will have to backport the patches from Python 3.5.2 that stop the blocking behaviour.
After a discussion on #fedora-python, I've decided to update Python to 3.5.2 in Fedora 24 instead of backporting the patches therefrom.
(In reply to Tomas Orsava from comment #6) > After a discussion on #fedora-python, I've decided to update Python to 3.5.2 > in Fedora 24 instead of backporting the patches therefrom. I have updated my Fedora 24 installation to Python 3.5.2 from [1] and I can confirm the issue is gone. [1] http://koji.fedoraproject.org/koji/taskinfo?taskID=16087644
python3-3.5.2-3.fc24, python3-docs-3.5.2-1.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-cd0636887f
(In reply to Vedran Miletić from comment #7) > (In reply to Tomas Orsava from comment #6) > > After a discussion on #fedora-python, I've decided to update Python to 3.5.2 > > in Fedora 24 instead of backporting the patches therefrom. > > I have updated my Fedora 24 installation to Python 3.5.2 from [1] and I can > confirm the issue is gone. > > [1] http://koji.fedoraproject.org/koji/taskinfo?taskID=16087644 That's great! If you can, please indicate so using karma as well: https://bodhi.fedoraproject.org/updates/python3-docs-3.5.2-1.fc24%20python3-3.5.2-3.fc24
Looks good. firewalld is coming up within 20 seconds, typically after 10. No other issues seen after the python upgrade. Thank you !
python3-3.5.2-3.fc24, python3-docs-3.5.2-1.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.