Created attachment 1454522 [details] lshw from the system where firewalld.service fails to start with kernel 4.17.2-200.fc28.x86_64 Description of problem: On a system with kernel 4.17.2-200.fc28.x86_64 on an PC ASUS P9X79 PRO with graphic card NVIDIA GF110 [GeForce GTX 580] and Intel 82579V network card firewalld.service fails to start with timeout. The system contains a fresh installed Fedora 28 system with all current updates. Jun 26 03:26:12 systemd[1]: Starting firewalld - dynamic firewall daemon... Jun 26 03:27:42 systemd[1]: firewalld.service: Start operation timed out. Terminating. Jun 26 03:27:42 systemd[1]: firewalld.service: Failed with result 'timeout'. Jun 26 03:27:42 systemd[1]: Failed to start firewalld - dynamic firewall daemon. This are the only messages that appear. I have tried the --debug option but I found no clue what the reason for the failure can be. I have installed and booted kernel 4.16.16-300.fc28.x86_64. With this kernel, firewalld.service starts normally. I have this behavior only on this hardware, on other servers and pcs firewalld starts normally also with kernel 4.17.2-200.fc28.x86_64. I attach the output of lshw from this system. Version-Release number of selected component (if applicable): kernel-4.17.2-200.fc28.x86_64 (firewalld failed to start) kernel-4.16.16-300.fc28.x86_64 (firewalld starts normally) firewalld-0.5.3-1.fc28.noarch fail2ban-0.10.3.1-2.fc28.noarch rpcbind-0.2.4-10.rc3.fc28.x86_64 ypbind-2.4-8.fc28.x86_64 How reproducible: Always
Hi, Thanks for the bug report. Can you attach dmesg from "journalctl -k --no-hostname > dmesg"? It also looks like firewalld has several debug levels. Does "firewalld --nofork --debug=10" produce any logs? I'm not sure exactly what that will produce, so you may want to look through it before attaching the output to make sure there's no data in there you'd like to keep private.
Created attachment 1454698 [details] output of "journalctl -b -2 -k --no-hostname > dmesg" (the last boot with kernel 4.17.2-200.fc28.x86_64)
Created attachment 1454706 [details] /var/log/firewalld (with some ip subnets substituted) (In reply to Jeremy Cline from comment #1) > It also looks like firewalld has several debug levels. Does "firewalld > --nofork --debug=10" produce any logs? I'm not sure exactly what that will > produce, so you may want to look through it before attaching the output to > make sure there's no data in there you'd like to keep private. When I try to (re)start firewalld after reboot, e.g. when I can get a terminal / shell to run the command that you suggest, then firewalld starts without errors, with each kernel. It only fails during boot, and at this time I cannot run this command. But I have modified /etc/sysconfig/firewalld to contain the line FIREWALLD_ARGS=--debug=10 and currently FIREWALLD_ARGS=--debug The dates of the files show the time of change: -rw-r--r--. 1 root root 80 26. Jun 01:17 /etc/sysconfig/firewalld -rw-r--r--. 1 root root 83 25. Jun 20:14 /etc/sysconfig/firewalld~ With these changes firewalld had logged to /var/log/firewalld also during boot. I add the times of the last reboots so you can see when the system was rebooted and with which kernel, so you can associate this times with the times in the log file. # last reboot|head reboot system boot 4.16.16-300.fc28 Tue Jun 26 03:54 still running reboot system boot 4.16.16-300.fc28 Tue Jun 26 03:53 - 03:54 (00:00) reboot system boot 4.17.2-200.fc28. Tue Jun 26 03:26 - 03:52 (00:26) reboot system boot 4.16.16-300.fc28 Tue Jun 26 01:36 - 03:25 (01:49) reboot system boot 4.17.2-200.fc28. Mon Jun 25 23:44 - 01:35 (01:51) reboot system boot 4.17.2-200.fc28. Mon Jun 25 23:31 - 23:43 (00:12) reboot system boot 4.17.2-200.fc28. Mon Jun 25 22:20 - 23:30 (01:09) reboot system boot 4.17.2-200.fc28. Mon Jun 25 21:40 - 22:20 (00:39) reboot system boot 4.17.2-200.fc28. Mon Jun 25 20:51 - 21:40 (00:48) reboot system boot 4.17.2-200.fc28. Mon Jun 25 20:15 - 20:51 (00:36)
Hmm, okay. It looks like the firewalld log contains nothing at all for the 4.17.2-200.fc28 Tue Jun 26 03:26 - 03:52 boot. Perhaps "systemd-analyze critical-chain firewalld.service" shows something of interest. You can see where it's hanging in the kernel log: Jun 26 03:26:12 kernel: Bluetooth: BNEP socket layer initialized Jun 26 03:27:42 kernel: IPv6: ADDRCONF(NETDEV_UP): eno1: link is not ready <snip> Jun 26 03:27:46 kernel: e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx The first two lines are exactly 90 seconds apart, which is the default timeout for systemd services. firewalld.service has a "Before=network-pre.target", meaning it wants to start before the network is configured[0]. The fact that it started up fine after it's booted and the network is configured makes me think this has something to do with that. I see you're running an out of tree kernel module. I'm not familiar with how VirtualBox does networking, but I do see it's doing something with that. Can you reproduce this with a fresh install or otherwise untainted kernel? [0] https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/
I have removed Packages akmod-wl akmod-VirtualBox kmod-wl-* kmod-VirtualBox-*, but the behavior is still the same. I have set "FIREWALLD_ARGS=--debug=10" in /etc/sysconfig/firewalld, but it seams that firewalld does not write anything to file /var/log/firewalld if it fails to start, independent of the --debug* option. When I start firewalld after booting has finished (in a shell), then /var/log/firewalld gets filled with log lines, but in this case no error occurs, so I think this gives us not the information we need. I also tried to comment out lines in /etc/systemd/system/firewalld.service: #StandardOutput=null #StandardError=null But /var/log/firewalld does still not exist (after I renamed the existing file before reboot). What else can I do?
Can you include the output of "systemd-analyze critical-chain firewalld.service" after a boot of 4.17.2? The fact that firewalld isn't logging anything at all makes me think it's waiting on a dependency which isn't met and causes the service to time out, although I'm not certain about that. We need to figure out what it's waiting on and I don't see anything in the kernel logs that indicates a problem there. Perhaps also include the full system logs?
Created attachment 1454793 [details] Output of "systemd-analyze critical-chain firewalld.service"
Created attachment 1454794 [details] Output of "journalctl -b --no-hostname" with hostnames and ip addresses replaced
Created attachment 1454810 [details] Output of "systemd-analyze critical-chain firewalld.service" with kernel 4.16.16-300.fc28.x86_64
Created attachment 1454811 [details] Output of "journalctl -b --no-hostname" with hostnames and ip addresses replaced with kernel 4.16.16-300.fc28.x86_64 I have booted the system also with kernel 4.16.16-300.fc28.x86_64 and added the logs so you can see the difference.
Aha! There's a lot of services not starting, all due to SELinux denials ranging from dbus-daemon being unable to name_bind: audit[1215]: AVC avc: denied { name_bind } for pid=1215 comm="dbus-daemon" src=967 scontext=system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 tcontext=system_u:object_r:hi_reserved_port_t:s0 tclass=tcp_socket permissive=0 to a whole bunch of processes not being able to read the core dump pattern: AVC avc: denied { read } for pid=1900 comm="firewall-cmd" name="core_pattern" dev="proc" ino=17472 scontext=system_u:system_r:fail2ban_t:s0 tcontext=system_u:object_r:usermodehelper_t:s0 tclass=file permissive=0 You're also getting these denials in the 4.16.16, but it seems like maybe there's an ordering change that's causing it to be a problem. In any case, you shouldn't be getting any of these denials. You could try relabeling the system to see if that fixes things up, but if it doesn't it might be best to see if you can reproduce this with a fresh install of F28 that's been updated. If not, start adding things until you figure out what broke the SELinux labels.
Created attachment 1454827 [details] Output of "journalctl -b --no-hostname" with hostnames and ip addresses replaced, in permissive mode Thanks for your analysis. The system was new installed yesterday (with kickstart), and I have installed all updates. I did also a complete relabeling ("/sbin/fixfiles onboot"). I also have booted with SELinux in permissive mode (in /etc/selinux/config set "SELINUX=permissive"), but with the same results. "restorecon -rv" only changes /var/cache/dnf/* and /var/lib/ghc/pkg-dir.cache (not relevant for firewalld), and the following, which is not permanent over reboots: Relabeled /run/user/0/dbus-1 from unconfined_u:object_r:session_dbusd_tmp_t:s0 to unconfined_u:object_r:user_tmp_t:s0 Relabeled /run/user/0/dbus-1/services from unconfined_u:object_r:session_dbusd_tmp_t:s0 to unconfined_u:object_r:user_tmp_t:s0 There is a new package in selinux-policy-3.14.1-33.fc28 in koji, which includes a lots of fixes, but its build state is failed: https://koji.fedoraproject.org/koji/packageinfo?packageID=32 I have booted the system again in permissive mode, and attached the journalctl output. As far as I understand the permissive mode, the denials are listed but allowed, so this should not be the reason of firewalld failure? The only difference between working firewalld and failing firewalld are the kernels that I have used to boot, so I think something in the kernel has changed hat causes the problem. And yes, it may be an ordering problem - can this be caused by the kernel?
(In reply to Edgar Hoch from comment #12) > Created attachment 1454827 [details] > Output of "journalctl -b --no-hostname" with hostnames and ip addresses > replaced, in permissive mode > > Thanks for your analysis. The system was new installed yesterday (with > kickstart), and I have installed all updates. I did also a complete > relabeling ("/sbin/fixfiles onboot"). > > I also have booted with SELinux in permissive mode (in /etc/selinux/config > set "SELINUX=permissive"), but with the same results. "restorecon -rv" only > changes /var/cache/dnf/* and /var/lib/ghc/pkg-dir.cache (not relevant for > firewalld), and the following, which is not permanent over reboots: > > Relabeled /run/user/0/dbus-1 from > unconfined_u:object_r:session_dbusd_tmp_t:s0 to > unconfined_u:object_r:user_tmp_t:s0 > Relabeled /run/user/0/dbus-1/services from > unconfined_u:object_r:session_dbusd_tmp_t:s0 to > unconfined_u:object_r:user_tmp_t:s0 > > There is a new package in selinux-policy-3.14.1-33.fc28 in koji, which > includes a lots of fixes, but its build state is failed: > https://koji.fedoraproject.org/koji/packageinfo?packageID=32 > > I have booted the system again in permissive mode, and attached the > journalctl output. > > As far as I understand the permissive mode, the denials are listed but > allowed, so this should not be the reason of firewalld failure? Can you try doing a fresh installation of Fedora Workstation, updating it, and seeing if you still get the SELinux denials and firewalld still fails to start? I'm wondering if something you're doing installing is changing the policy. > > The only difference between working firewalld and failing firewalld are the > kernels that I have used to boot, so I think something in the kernel has > changed hat causes the problem. And yes, it may be an ordering problem - can > this be caused by the kernel? I'm not ruling out a kernel bug here, but there's nothing in the kernel log and there shouldn't be any SELinux denials happening. Let's narrow down on why those are occurring. If you can reproduce these SELinux denials on a clean Workstation installation, that's a problem we need to fix in the SELinux policy. If not, something you're installing is breaking the policy in some way and that should be fixed.
I have done a fresh installation of Fedora 28 with my kickstart file, this has worked fine, firewalld was running. I then applied the contents of my after-kickstart scripts manually step by step. One part is to install additional packages. I found out that firewalld fails to start if package python3-coverage_pth is installed, and it starts without error if I remove the package. But it seams to me that package python3-coverage_pth doesn't have files that causes the problem, and it have not scripts and triggers. The description tells: "A .pth file to site-packages to enable coverage.py." So it seams to activate python3-coverage. I have removed python3-coverage, it also removed packages anaconda anaconda-core anaconda-gui anaconda-tui initial-setup initial-setup-gui kexec-tools-anaconda-addon. As expected, firewalld was started normally. Then I have installed python3-coverage_pth again, which required (only) python3-coverage. In this configuration firewalld also failed starting. If I install one of packages anaconda anaconda-core anaconda-gui anaconda-tui initial-setup initial-setup-gui kexec-tools-anaconda-addon, they install package python3-coverage as a dependency, too. I suppose that firewalld is influenced by python3-coverage it it is activated by python3-coverage_pth, because firewalld is written in python itself. It seems that it may depend on hardware, because I have python3-coverage_pth installed on other hosts too, and on these hosts firewalld is running. The main difference to other hosts is that this has the following graphic card installed (line from lspci output): 01:00.0 VGA compatible controller: NVIDIA Corporation GF110 [GeForce GTX 580] (rev a1) I am wondering if (and why) installing a package does have such consequences. It seams that installation of package python3-coverage_pth is like a flag which is set to on if the package is installed, and to off, if it is not installed. But this is - in my opinion - a misuse of a package. A better solution would be a program which can be called by an administrator to set the flag on or off, which writes something to a file (for example, in /etc or /var), or even a config file (in /etc) which can be changed by an administrator. Now it seems that we know that an activated python3-coverage is blocking something in firewalld until it runs into timeout, but only on some hardware. It seems we need a deeper look into the code of firewalld and python3-coverage. But currently I don't know how to find where firewalld is waiting during booting.
I want to note that the new installation of comment #14 is running kernel-4.17.3-200.fc28.x86_64.
If you are wondering why package python3-coverage_pth is on my installation list: I try to install as much (useful) packages as possible, in this case as many python packages as possible (some has conflicts and I have to exclude them), to provide our users nearly all available python modules. I don't want to add a package to the list every time a user will need it and send me a request. I want to install them once and then users can use them. So I use wildcards in some package installation lists, and one pattern is "python[23]-py*" which includes python3-pytest-testmon which requires python3-coverage_pth, so this package was installed.
Okay, I'm going to assign this to coverage_pth as I'm quite confident it's not a kernel bug. I don't really know why anything depends on coverage_pth in the distribution, as it sounds like it's meant for Python virtualenvs and seems to just install a data file that enables test coverage (which doesn't make sense to do outside of development). Since I don't understand the purpose of coverage_pth I'm not sure what the fix will be, but for the time being I'd recommend not installing coverage_pth.
Edgar, Can you add "--debug" to FIREWALLD_ARGS= in /etc/sysconfig/firewalld. Then reproduced it again and post /var/log/firewalld.
(In reply to Eric Garver from comment #18) > Edgar, Can you add "--debug" to FIREWALLD_ARGS= in /etc/sysconfig/firewalld. > Then reproduced it again and post /var/log/firewalld. Scratch that. It's already attached to this BZ. Thanks!
This message is a reminder that Fedora 28 is nearing its end of life. On 2019-May-28 Fedora will stop maintaining and issuing updates for Fedora 28. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '28'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 28 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
I have no problems with Fedora 29, and coverage_pth doesn't exist in Fedora 29. I close this bug now.