Bug 1595041 - firewalld fails to start at boot with kernel 4.17.2-200.fc28.x86_64
Summary: firewalld fails to start at boot with kernel 4.17.2-200.fc28.x86_64
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: firewalld
Version: 28
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Eric Garver
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-26 01:55 UTC by Edgar Hoch
Modified: 2019-05-02 22:22 UTC (History)
22 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2019-05-02 22:22:38 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
lshw from the system where firewalld.service fails to start with kernel 4.17.2-200.fc28.x86_64 (38.06 KB, text/plain)
2018-06-26 01:55 UTC, Edgar Hoch
no flags Details
output of "journalctl -b -2 -k --no-hostname > dmesg" (the last boot with kernel 4.17.2-200.fc28.x86_64) (83.05 KB, text/x-vhdl)
2018-06-26 15:32 UTC, Edgar Hoch
no flags Details
/var/log/firewalld (with some ip subnets substituted) (714.54 KB, text/plain)
2018-06-26 15:50 UTC, Edgar Hoch
no flags Details
Output of "systemd-analyze critical-chain firewalld.service" (796 bytes, text/plain)
2018-06-26 21:12 UTC, Edgar Hoch
no flags Details
Output of "journalctl -b --no-hostname" with hostnames and ip addresses replaced (246.11 KB, text/plain)
2018-06-26 21:13 UTC, Edgar Hoch
no flags Details
Output of "systemd-analyze critical-chain firewalld.service" with kernel 4.16.16-300.fc28.x86_64 (781 bytes, text/plain)
2018-06-26 21:28 UTC, Edgar Hoch
no flags Details
Output of "journalctl -b --no-hostname" with hostnames and ip addresses replaced with kernel 4.16.16-300.fc28.x86_64 (446.11 KB, text/plain)
2018-06-26 21:29 UTC, Edgar Hoch
no flags Details
Output of "journalctl -b --no-hostname" with hostnames and ip addresses replaced, in permissive mode (254.65 KB, text/plain)
2018-06-26 22:33 UTC, Edgar Hoch
no flags Details

Description Edgar Hoch 2018-06-26 01:55:20 UTC
Created attachment 1454522 [details]
lshw from the system where firewalld.service fails to start with kernel 4.17.2-200.fc28.x86_64

Description of problem:

On a system with kernel 4.17.2-200.fc28.x86_64 on an PC ASUS P9X79 PRO with graphic card NVIDIA GF110 [GeForce GTX 580] and Intel 82579V network card
firewalld.service fails to start with timeout. The system contains a fresh installed Fedora 28 system with all current updates.


Jun 26 03:26:12 systemd[1]: Starting firewalld - dynamic firewall daemon...
Jun 26 03:27:42 systemd[1]: firewalld.service: Start operation timed out. Terminating.
Jun 26 03:27:42 systemd[1]: firewalld.service: Failed with result 'timeout'.
Jun 26 03:27:42 systemd[1]: Failed to start firewalld - dynamic firewall daemon.


This are the only messages that appear.
I have tried the --debug option but I found no clue what the reason for the failure can be.

I have installed and booted kernel 4.16.16-300.fc28.x86_64. With this kernel, firewalld.service starts normally.

I have this behavior only on this hardware, on other servers and pcs firewalld starts normally also with kernel 4.17.2-200.fc28.x86_64. I attach the output of lshw from this system.


Version-Release number of selected component (if applicable):
kernel-4.17.2-200.fc28.x86_64 (firewalld failed to start)
kernel-4.16.16-300.fc28.x86_64 (firewalld starts normally)
firewalld-0.5.3-1.fc28.noarch
fail2ban-0.10.3.1-2.fc28.noarch
rpcbind-0.2.4-10.rc3.fc28.x86_64
ypbind-2.4-8.fc28.x86_64


How reproducible:
Always

Comment 1 Jeremy Cline 2018-06-26 15:25:27 UTC
Hi,

Thanks for the bug report. Can you attach dmesg from "journalctl -k --no-hostname > dmesg"?

It also looks like firewalld has several debug levels. Does "firewalld --nofork --debug=10" produce any logs? I'm not sure exactly what that will produce, so you may want to look through it before attaching the output to make sure there's no data in there you'd like to keep private.

Comment 2 Edgar Hoch 2018-06-26 15:32:33 UTC
Created attachment 1454698 [details]
output of "journalctl -b -2 -k --no-hostname > dmesg" (the last boot with kernel 4.17.2-200.fc28.x86_64)

Comment 3 Edgar Hoch 2018-06-26 15:50:00 UTC
Created attachment 1454706 [details]
/var/log/firewalld (with some ip subnets substituted)

(In reply to Jeremy Cline from comment #1)
> It also looks like firewalld has several debug levels. Does "firewalld
> --nofork --debug=10" produce any logs? I'm not sure exactly what that will
> produce, so you may want to look through it before attaching the output to
> make sure there's no data in there you'd like to keep private.

When I try to (re)start firewalld after reboot, e.g. when I can get a terminal / shell to run the command that you suggest, then firewalld starts without errors, with each kernel. It only fails during boot, and at this time I cannot run this command.

But I have modified /etc/sysconfig/firewalld to contain the line
FIREWALLD_ARGS=--debug=10

and currently
FIREWALLD_ARGS=--debug

The dates of the files show the time of change:
-rw-r--r--. 1 root root 80 26. Jun 01:17 /etc/sysconfig/firewalld
-rw-r--r--. 1 root root 83 25. Jun 20:14 /etc/sysconfig/firewalld~

With these changes firewalld had logged to /var/log/firewalld also during boot.


I add the times of the last reboots so you can see when the system was rebooted and with which kernel, so you can associate this times with the times in the log file.

# last reboot|head
reboot   system boot  4.16.16-300.fc28 Tue Jun 26 03:54   still running
reboot   system boot  4.16.16-300.fc28 Tue Jun 26 03:53 - 03:54  (00:00)
reboot   system boot  4.17.2-200.fc28. Tue Jun 26 03:26 - 03:52  (00:26)
reboot   system boot  4.16.16-300.fc28 Tue Jun 26 01:36 - 03:25  (01:49)
reboot   system boot  4.17.2-200.fc28. Mon Jun 25 23:44 - 01:35  (01:51)
reboot   system boot  4.17.2-200.fc28. Mon Jun 25 23:31 - 23:43  (00:12)
reboot   system boot  4.17.2-200.fc28. Mon Jun 25 22:20 - 23:30  (01:09)
reboot   system boot  4.17.2-200.fc28. Mon Jun 25 21:40 - 22:20  (00:39)
reboot   system boot  4.17.2-200.fc28. Mon Jun 25 20:51 - 21:40  (00:48)
reboot   system boot  4.17.2-200.fc28. Mon Jun 25 20:15 - 20:51  (00:36)

Comment 4 Jeremy Cline 2018-06-26 18:08:39 UTC
Hmm, okay. It looks like the firewalld log contains nothing at all for the 4.17.2-200.fc28 Tue Jun 26 03:26 - 03:52 boot. Perhaps "systemd-analyze critical-chain firewalld.service" shows something of interest.

You can see where it's hanging in the kernel log:

Jun 26 03:26:12 kernel: Bluetooth: BNEP socket layer initialized
Jun 26 03:27:42 kernel: IPv6: ADDRCONF(NETDEV_UP): eno1: link is not ready
<snip>
Jun 26 03:27:46 kernel: e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx

The first two lines are exactly 90 seconds apart, which is the default timeout for systemd services. firewalld.service has a "Before=network-pre.target", meaning it wants to start before the network is configured[0]. The fact that it started up fine after it's booted and the network is configured makes me think this has something to do with that.

I see you're running an out of tree kernel module. I'm not familiar with how VirtualBox does networking, but I do see it's doing something with that. Can you reproduce this with a fresh install or otherwise untainted kernel?

[0] https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/

Comment 5 Edgar Hoch 2018-06-26 20:07:54 UTC
I have removed Packages akmod-wl akmod-VirtualBox kmod-wl-* kmod-VirtualBox-*, but the behavior is still the same.

I have set "FIREWALLD_ARGS=--debug=10" in /etc/sysconfig/firewalld, but it seams that firewalld does not write anything to file /var/log/firewalld if it fails to start, independent of the --debug* option.
When I start firewalld after booting has finished (in a shell), then /var/log/firewalld gets filled with log lines, but in this case no error occurs, so I think this gives us not the information we need.

I also tried to comment out lines in /etc/systemd/system/firewalld.service:
#StandardOutput=null
#StandardError=null
But /var/log/firewalld does still not exist (after I renamed the existing file before reboot).

What else can I do?

Comment 6 Jeremy Cline 2018-06-26 20:55:29 UTC
Can you include the output of "systemd-analyze critical-chain firewalld.service" after a boot of 4.17.2?

The fact that firewalld isn't logging anything at all makes me think it's waiting on a dependency which isn't met and causes the service to time out, although I'm not certain about that. We need to figure out what it's waiting on and I don't see anything in the kernel logs that indicates a problem there. Perhaps also include the full system logs?

Comment 7 Edgar Hoch 2018-06-26 21:12:28 UTC
Created attachment 1454793 [details]
Output of "systemd-analyze critical-chain firewalld.service"

Comment 8 Edgar Hoch 2018-06-26 21:13:30 UTC
Created attachment 1454794 [details]
Output of "journalctl -b --no-hostname" with hostnames and ip addresses replaced

Comment 9 Edgar Hoch 2018-06-26 21:28:20 UTC
Created attachment 1454810 [details]
Output of "systemd-analyze critical-chain firewalld.service" with kernel 4.16.16-300.fc28.x86_64

Comment 10 Edgar Hoch 2018-06-26 21:29:20 UTC
Created attachment 1454811 [details]
Output of "journalctl -b --no-hostname" with hostnames and ip addresses replaced with kernel 4.16.16-300.fc28.x86_64

I have booted the system also with kernel 4.16.16-300.fc28.x86_64 and added the logs so you can see the difference.

Comment 11 Jeremy Cline 2018-06-26 21:55:46 UTC
Aha! There's a lot of services not starting, all due to SELinux denials ranging from dbus-daemon being unable to name_bind:

audit[1215]: AVC avc:  denied  { name_bind } for  pid=1215 comm="dbus-daemon" src=967 scontext=system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 tcontext=system_u:object_r:hi_reserved_port_t:s0 tclass=tcp_socket permissive=0

to a whole bunch of processes not being able to read the core dump pattern:

AVC avc:  denied  { read } for  pid=1900 comm="firewall-cmd" name="core_pattern" dev="proc" ino=17472 scontext=system_u:system_r:fail2ban_t:s0 tcontext=system_u:object_r:usermodehelper_t:s0 tclass=file permissive=0

You're also getting these denials in the 4.16.16, but it seems like maybe there's an ordering change that's causing it to be a problem. In any case, you shouldn't be getting any of these denials.

You could try relabeling the system to see if that fixes things up, but if it doesn't it might be best to see if you can reproduce this with a fresh install of F28 that's been updated. If not, start adding things until you figure out what broke the SELinux labels.

Comment 12 Edgar Hoch 2018-06-26 22:33:31 UTC
Created attachment 1454827 [details]
Output of "journalctl -b --no-hostname" with hostnames and ip addresses replaced, in permissive mode

Thanks for your analysis. The system was new installed yesterday (with kickstart), and I have installed all updates. I did also a complete relabeling ("/sbin/fixfiles onboot").

I also have booted with SELinux in permissive mode (in /etc/selinux/config set "SELINUX=permissive"), but with the same results. "restorecon -rv" only changes /var/cache/dnf/* and /var/lib/ghc/pkg-dir.cache (not relevant for firewalld), and the following, which is not permanent over reboots:

Relabeled /run/user/0/dbus-1 from unconfined_u:object_r:session_dbusd_tmp_t:s0 to unconfined_u:object_r:user_tmp_t:s0
Relabeled /run/user/0/dbus-1/services from unconfined_u:object_r:session_dbusd_tmp_t:s0 to unconfined_u:object_r:user_tmp_t:s0

There is a new package in selinux-policy-3.14.1-33.fc28 in koji, which includes a lots of fixes, but its build state is failed:
https://koji.fedoraproject.org/koji/packageinfo?packageID=32

I have booted the system again in permissive mode, and attached the journalctl output.

As far as I understand the permissive mode, the denials are listed but allowed, so this should not be the reason of firewalld failure?

The only difference between working firewalld and failing firewalld are the kernels that I have used to boot, so I think something in the kernel has changed hat causes the problem. And yes, it may be an ordering problem - can this be caused by the kernel?

Comment 13 Jeremy Cline 2018-06-28 18:30:25 UTC
(In reply to Edgar Hoch from comment #12)
> Created attachment 1454827 [details]
> Output of "journalctl -b --no-hostname" with hostnames and ip addresses
> replaced, in permissive mode
> 
> Thanks for your analysis. The system was new installed yesterday (with
> kickstart), and I have installed all updates. I did also a complete
> relabeling ("/sbin/fixfiles onboot").
> 
> I also have booted with SELinux in permissive mode (in /etc/selinux/config
> set "SELINUX=permissive"), but with the same results. "restorecon -rv" only
> changes /var/cache/dnf/* and /var/lib/ghc/pkg-dir.cache (not relevant for
> firewalld), and the following, which is not permanent over reboots:
> 
> Relabeled /run/user/0/dbus-1 from
> unconfined_u:object_r:session_dbusd_tmp_t:s0 to
> unconfined_u:object_r:user_tmp_t:s0
> Relabeled /run/user/0/dbus-1/services from
> unconfined_u:object_r:session_dbusd_tmp_t:s0 to
> unconfined_u:object_r:user_tmp_t:s0
> 
> There is a new package in selinux-policy-3.14.1-33.fc28 in koji, which
> includes a lots of fixes, but its build state is failed:
> https://koji.fedoraproject.org/koji/packageinfo?packageID=32
> 
> I have booted the system again in permissive mode, and attached the
> journalctl output.
> 
> As far as I understand the permissive mode, the denials are listed but
> allowed, so this should not be the reason of firewalld failure?

Can you try doing a fresh installation of Fedora Workstation, updating it, and seeing if you still get the SELinux denials and firewalld still fails to start? I'm wondering if something you're doing installing is changing the policy.

> 
> The only difference between working firewalld and failing firewalld are the
> kernels that I have used to boot, so I think something in the kernel has
> changed hat causes the problem. And yes, it may be an ordering problem - can
> this be caused by the kernel?

I'm not ruling out a kernel bug here, but there's nothing in the kernel log and there shouldn't be any SELinux denials happening. Let's narrow down on why those are occurring. If you can reproduce these SELinux denials on a clean Workstation installation, that's a problem we need to fix in the SELinux policy. If not, something you're installing is breaking the policy in some way and that should be fixed.

Comment 14 Edgar Hoch 2018-07-04 22:14:02 UTC
I have done a fresh installation of Fedora 28 with my kickstart file, this has worked fine, firewalld was running.

I then applied the contents of my after-kickstart scripts manually step by step. One part is to install additional packages.

I found out that firewalld fails to start if package python3-coverage_pth is installed, and it starts without error if I remove the package.

But it seams to me that package python3-coverage_pth doesn't have files that causes the problem, and it have not scripts and triggers. The description tells: "A .pth file to site-packages to enable coverage.py."


So it seams to activate python3-coverage.
I have removed python3-coverage, it also removed packages 
anaconda anaconda-core anaconda-gui anaconda-tui initial-setup initial-setup-gui kexec-tools-anaconda-addon.

As expected, firewalld was started normally.

Then I have installed python3-coverage_pth again, which required (only) python3-coverage. In this configuration firewalld also failed starting.

If I install one of packages anaconda anaconda-core anaconda-gui anaconda-tui initial-setup initial-setup-gui kexec-tools-anaconda-addon, they install package python3-coverage as a dependency, too.


I suppose that firewalld is influenced by python3-coverage it it is activated by python3-coverage_pth, because firewalld is written in python itself. It seems that it may depend on hardware, because I have python3-coverage_pth installed on other hosts too, and on these hosts firewalld is running.

The main difference to other hosts is that this has the following graphic card installed (line from lspci output):
01:00.0 VGA compatible controller: NVIDIA Corporation GF110 [GeForce GTX 580] (rev a1)


I am wondering if (and why) installing a package does have such consequences. It seams that installation of package python3-coverage_pth is like a flag which is set to on if the package is installed, and to off, if it is not installed. But this is - in my opinion - a misuse of a package. A better solution would be a program which can be called by an administrator to set the flag on or off, which writes something to a file (for example, in /etc or /var), or even a config file (in /etc) which can be changed by an administrator.


Now it seems that we know that an activated python3-coverage is blocking something in firewalld until it runs into timeout, but only on some hardware. It seems we need a deeper look into the code of firewalld and python3-coverage. But currently I don't know how to find where firewalld is waiting during booting.

Comment 15 Edgar Hoch 2018-07-04 22:15:54 UTC
I want to note that the new installation of comment #14 is running kernel-4.17.3-200.fc28.x86_64.

Comment 16 Edgar Hoch 2018-07-04 23:00:48 UTC
If you are wondering why package python3-coverage_pth is on my installation list:

I try to install as much (useful) packages as possible, in this case as many python packages as possible (some has conflicts and I have to exclude them), to provide our users nearly all available python modules. I don't want to add a package to the list every time a user will need it and send me a request. I want to install them once and then users can use them.

So I use wildcards in some package installation lists, and one pattern is "python[23]-py*" which includes python3-pytest-testmon which requires python3-coverage_pth, so this package was installed.

Comment 17 Jeremy Cline 2018-07-09 13:40:30 UTC
Okay, I'm going to assign this to coverage_pth as I'm quite confident it's not a kernel bug. I don't really know why anything depends on coverage_pth in the distribution, as it sounds like it's meant for Python virtualenvs and seems to just install a data file that enables test coverage (which doesn't make sense to do outside of development).

Since I don't understand the purpose of coverage_pth I'm not sure what the fix will be, but for the time being I'd recommend not installing coverage_pth.

Comment 18 Eric Garver 2018-07-11 17:45:13 UTC
Edgar, Can you add "--debug" to FIREWALLD_ARGS= in /etc/sysconfig/firewalld. Then reproduced it again and post /var/log/firewalld.

Comment 19 Eric Garver 2018-07-11 17:46:50 UTC
(In reply to Eric Garver from comment #18)
> Edgar, Can you add "--debug" to FIREWALLD_ARGS= in /etc/sysconfig/firewalld.
> Then reproduced it again and post /var/log/firewalld.

Scratch that. It's already attached to this BZ. Thanks!

Comment 20 Ben Cotton 2019-05-02 19:17:36 UTC
This message is a reminder that Fedora 28 is nearing its end of life.
On 2019-May-28 Fedora will stop maintaining and issuing updates for
Fedora 28. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora 'version' of '28'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 28 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 21 Ben Cotton 2019-05-02 21:19:41 UTC
This message is a reminder that Fedora 28 is nearing its end of life.
On 2019-May-28 Fedora will stop maintaining and issuing updates for
Fedora 28. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora 'version' of '28'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 28 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 22 Edgar Hoch 2019-05-02 22:22:38 UTC
I have no problems with Fedora 29, and coverage_pth doesn't exist in Fedora 29. I close this bug now.


Note You need to log in before you can comment on or make changes to this bug.