Bug 1294974 - slow boot for firewalld
Summary: slow boot for firewalld
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: firewalld
Version: 23
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Thomas Woerner
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-12-31 11:58 UTC by Artem
Modified: 2016-12-20 17:30 UTC (History)
7 users (show)

Fixed In Version: firewalld-0.4.0-2.fc23
Clone Of: 1112924
Environment:
Last Closed: 2016-12-20 17:30:54 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
logs (402.36 KB, application/zip)
2015-12-31 11:58 UTC, Artem
no flags Details
systemd-analyze dump (527.86 KB, text/plain)
2016-01-04 14:08 UTC, Artem
no flags Details
systemd-analyze dump (523.27 KB, text/plain)
2016-01-14 19:14 UTC, Artem
no flags Details
systemd-analyze plot (101.54 KB, image/svg+xml)
2016-01-14 19:14 UTC, Artem
no flags Details
systemd-analyze dump (525.59 KB, text/plain)
2016-02-12 16:02 UTC, Artem
no flags Details
systemd-analyze plot (104.29 KB, image/svg+xml)
2016-02-12 16:02 UTC, Artem
no flags Details
output of systemd-bootchart showing process activity during boot (1.54 MB, image/svg+xml)
2016-02-17 16:20 UTC, Oliver Henshaw
no flags Details
output of systemd-analyze plot showing corresponding service activity during boot (150.10 KB, image/svg+xml)
2016-02-17 16:21 UTC, Oliver Henshaw
no flags Details
"strace -tt -T -C" of firewalld service launch during boot (1004.71 KB, text/plain)
2016-02-19 15:45 UTC, Oliver Henshaw
no flags Details
'/usr/bin/strace -tt -T' of firewalld service launch during boot (F23 SSD) (883.33 KB, text/plain)
2016-06-24 17:06 UTC, Oliver Henshaw
no flags Details

Description Artem 2015-12-31 11:58:33 UTC
Created attachment 1110759 [details]
logs

+++ This bug was initially created as a clone of Bug #1112924 +++

Description of problem:


Version-Release number of selected component (if applicable):
firewalld-0.3.14.2-4.fc23.noarch

How reproducible:
boot system

Steps to Reproduce:
1. Boot system


Actual results:

root@nuc log]# systemd-analyze blame
         11.678s firewalld.service
...

Expected results:
firewalld.service take less time

Additional info:

- firewall-cmd --reload takes under 2 seconds, only the first start takes long


--- Additional comment from Jiri Popelka on 2014-06-25 03:15:15 EDT ---

(In reply to lionghostshop from comment #0)
>          12.997s firewalld.service

That's a lot, yes.
Please add 'FIREWALLD_ARGS=--debug' into /etc/sysconfig/firewalld
then reboot and 
- attach output of 'journalctl -b'
- attach /var/log/firewalld
- attach files from /etc/firewalld/zones/ if there are any

thanks

--- Additional comment from  on 2014-06-25 03:41:06 EDT ---

required files, I zip it since there are a few

--- Additional comment from  on 2014-06-25 03:48:11 EDT ---

Note that I am using btrfs in /.
Previously, I found rsyslog is very slow using btrfs. I disable it and reduce boot time from 3 minutes to 32 seconds. 
After booting, applications start in reasonable time and are fast enough.

--- Additional comment from Jiri Popelka on 2014-06-25 04:53:24 EDT ---

journalctl:
15:35:16 systemd[1]: Starting firewalld
15:35:23 polkitd[769]: Started polkitd version 0.112
15:35:30 systemd[1]: Started firewalld

/var/log/firewalld
15:35:30 DEBUG1: start()

My rough guess is that the whole 14s before firewalld actually started to "do something" had taken the initialization of things that firewalld requires, i.e. D-Bus, Polkit, loading of kernel netfilter modules etc.

--- Additional comment from Thomas Woerner on 2014-06-25 08:20:25 EDT ---

The start times that systemd is reporting for a service is not only the start time of the service itself, but also the start time of other services, that are required by this service - if it is the first service that requires this  service. 
But the start times of other services that have the same requirement are not increased by the start times of the requirement. Therefore the start times reported by systemd are only some numbers and not suitable to compare or blame services.

There are start times for lots of requirements only added to firewalld, because it is started very early.

You can simply verify the whole blame issue by deactivating the service that is taking most time as reported by systemd-analyze blame over and over again. You will notice that other services need more time to start after deactivating the most consuming service and the start time of the system is not reduced a lot.

You system will boot faster in the end, after some iterations, because there are no services started anymore, but it will only be of limited use...

--- Additional comment from Oliver Henshaw on 2015-03-26 14:47:11 EDT ---

I see this too and am not convinced by comment #5. firewalld would be started after any explicit dependencies specified in the .service file and I don't see any particular smoking guns for implicit dependencies in the journal output.

This is a low-powered machine (an AMD E-350) with classical hard drives so I'm not expecting blazing fast startup but still the startup time of firewalld.service is out of proportion to other services.

So I found if I set FIREWALLD_ARGS=--debug=10 in /etc/sysconfig/firewalld I got a bit more logging output between "Starting firewalld" and "Started firewalld"

I also added a snippet to make firewalld log to journalctl as well as its own file:

$ cat /etc/systemd/system/firewalld.service.d/logging.conf 
[Service]
# set back to defaults
# it logs debug output to /valor/log/firewalld.log anyways, this just helps line timestamps up
StandardOutput=journal
StandardError=inherit


The results are attached - the 10 seconds between firewalld starting and being started is on the high end for the machine but it's interesting that most of that is before the iptables calls.

I then sprinkled in some debugging lines in /usr/sbin/firewalld and found that most of the time happens before the first non-comment line of the file.

So the cause could actually be slowness in python startup?

--- Additional comment from Fedora End Of Life on 2015-05-29 08:12:46 EDT ---

This message is a reminder that Fedora 20 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 20. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '20'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 20 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

--- Additional comment from Fedora End Of Life on 2015-06-29 17:18:41 EDT ---

Fedora 20 changed to end-of-life (EOL) status on 2015-06-23. Fedora 20 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 1 Thomas Woerner 2016-01-04 11:22:04 UTC
Please add the following lines to the firewalld.service file in /usr/lib/systemd/system/firewalld.service:

After=syslog.target
After=polkit.service
After=abrtd.service

These lines make sure that the start time of polkit, syslog and abrtd are not added to the start time of firewalld.

Please report back if this is helping in your case and please attach the output of "systemd-analyze dump"

There is also something strange in your log: NetworkManager is started at the same time as firewalld. But there is Before=NetworkManager.service in the firewalld.service file, which seems to be ignored.

Comment 2 Thomas Woerner 2016-01-04 11:57:36 UTC
Please ignore the comment about NetworkManager.service.

Comment 3 Artem 2016-01-04 14:07:26 UTC
Seems to have a positive effect, at least now it's just 5s

[gtema@nuc ~]$ systemd-analyze blame          
         11.102s vncserver@:0.service
          8.819s postgresql.service
          7.508s packagekit.service
          7.497s dev-sda2.device
          7.445s systemd-udev-settle.service
          5.233s chronyd.service
          5.122s firewalld.service
          5.101s systemd-logind.service
          5.100s upower.service
          5.048s avahi-daemon.service
          4.991s rsyslog.service
          4.973s rtkit-daemon.service
          3.868s abrtd.service
          3.547s wpa_supplicant.service
          3.510s systemd-journal-flush.service
          3.241s accounts-daemon.service
          2.851s systemd-udevd.service
          2.005s cups.service
          1.876s polkit.service
          1.570s abrt-ccpp.service
          1.173s plymouth-start.service
          1.069s systemd-fsck
          1.003s dmraid-activation.service

Comment 4 Artem 2016-01-04 14:08:04 UTC
Created attachment 1111490 [details]
systemd-analyze dump

Comment 5 Jan Synacek 2016-01-05 11:37:00 UTC
firewalld, polkit and abrt are all dbus activated. It doesn't really matter if you make the ordering explicit (by using the After=), or leave it implicit. What you did is that you made the abrt and polkit services *explicitly* start *before* firewalld. Then, you just compare "boot times" of the firewalld itself. 

Because the original time line doesn't show polkit and abrt, it's not directly visible. Please compare the times of all three services summed up. That should be, in both cases, roughly the same. Maybe on old ha

Comment 6 Jan Synacek 2016-01-05 11:38:35 UTC
... rdware, with less CPUs for example, the overall boot time might be larger.

Comment 7 Artem 2016-01-09 12:02:28 UTC
Booted today I have the following blame:

[gtema@nuc ~]$ systemd-analyze blame
         11.265s firewalld.service
          9.779s accounts-daemon.service
          8.264s systemd-udev-settle.service
          8.123s dev-sda2.device
          7.553s postgresql.service
          6.702s abrtd.service
          4.463s packagekit.service
          4.365s systemd-journal-flush.service
          4.134s rsyslog.service
          3.838s upower.service
          3.091s systemd-udevd.service
          2.399s polkit.service
          1.599s systemd-tmpfiles-setup-dev.service
          1.527s cups.service
          1.281s colord.service
          1.169s avahi-daemon.service
          1.014s systemd-sysctl.service
           914ms abrt-ccpp.service
          ...

So firewall+abrtd+polkit ~21s
firewalld.service still has
After=syslog.target
After=polkit.service
After=abrtd.service

Comment 8 Thomas Woerner 2016-01-11 13:24:17 UTC
It seems there is something else that is added to the start time of firewalld. Please add the output of "systemd-analyze dump" and "systemd-analyze plot" again

Comment 9 Artem 2016-01-14 19:13:42 UTC
It does not occur always, normally start time takes about 5s (what is still quite long). But finally I got again ~10s

Comment 10 Artem 2016-01-14 19:14:14 UTC
Created attachment 1114935 [details]
systemd-analyze dump

Comment 11 Artem 2016-01-14 19:14:41 UTC
Created attachment 1114936 [details]
systemd-analyze plot

Comment 12 Fedora Update System 2016-02-04 15:41:04 UTC
firewalld-0.4.0-1.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2016-fc0691e6a7

Comment 13 Fedora Update System 2016-02-05 01:23:35 UTC
firewalld-0.4.0-1.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-fc0691e6a7

Comment 14 Fedora Update System 2016-02-08 13:28:53 UTC
firewalld-0.4.0-2.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2016-fc0691e6a7

Comment 15 Fedora Update System 2016-02-09 22:27:45 UTC
firewalld-0.4.0-2.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-fc0691e6a7

Comment 16 Artem 2016-02-12 16:00:51 UTC
not sure it is better now:

[gtema@nuc system]$ systemd-analyze 
Startup finished in 727ms (kernel) + 2.705s (initrd) + 44.387s (userspace) = 47.820s
[gtema@nuc system]$ systemd-analyze blame
         12.633s firewalld.service
         11.311s systemd-udev-settle.service
         10.251s abrtd.service
          9.977s dev-sda2.device
          9.497s accounts-daemon.service
          7.876s postgresql.service
          6.650s systemd-journal-flush.service
          5.164s packagekit.service
          5.015s systemd-journald.service
          4.575s avahi-daemon.service
          4.527s rsyslog.service
          4.226s rtkit-daemon.service
          4.199s upower.service
          3.763s systemd-udevd.service
          3.476s cups.service
          2.564s polkit.service
          2.517s dnf-makecache.service
          1.935s systemd-fsck
          1.685s sshd.service
          1.398s wpa_supplicant.service
          1.387s systemd-tmpfiles-setup-dev.service
          1.337s udisks2.service
          1.280s abrt-ccpp.service
          1.212s chronyd.service
          1.173s systemd-rfkill
          1.068s plymouth-start.service
          1.049s systemd-sysctl.service

Comment 17 Artem 2016-02-12 16:02:13 UTC
Created attachment 1123566 [details]
systemd-analyze dump

Comment 18 Artem 2016-02-12 16:02:40 UTC
Created attachment 1123568 [details]
systemd-analyze plot

Comment 19 Oliver Henshaw 2016-02-17 16:18:36 UTC
Firewalld is slower on F22 than it was in F21 on my hdd system. The way several services take more than ten seconds reminds me of the old journal flushing bugs, e.g. bug #1006386 and duplicates. Perhaps the lack of the readahead service is to blame for this? Either there's more random io swamping the system and slowing boot or something is getting delayed and other processes are blocking on it.

A another ssd systemd, firewalld starts faster in a real boot and in VMS but still contributes more time than most other services. I expect the above update in F23 may help there, but haven't tested yet.


Below is a bootchart from systemd-bootchart and accompanying systemd-analyze output. Note that systemd-bootchart is started as a service, as it seems to finish too early when started as init, but it still starts early enough to show activity during firewalld activation.

There doesn't seem to be much iowait during firewalld activation afaict, though there is a lot of read activity all through boot. setroubleshootd looks suspicious, maybe?

systemd-219-27.fc22.x86_64
firewalld-0.3.14.2-4.fc22.noarch

Comment 20 Oliver Henshaw 2016-02-17 16:20:05 UTC
Created attachment 1127976 [details]
output of systemd-bootchart showing process activity during boot

Comment 21 Oliver Henshaw 2016-02-17 16:21:53 UTC
Created attachment 1127978 [details]
output of systemd-analyze plot showing corresponding service activity during boot

Comment 22 Oliver Henshaw 2016-02-19 15:45:14 UTC
Created attachment 1128573 [details]
"strace -tt -T -C" of firewalld service launch during boot

Here's an strace of F22 firewalld startup during boot. Is there anything suspicious in there? I know it's not as useful as an strace from F23 but maybe there's still some unexpected waits?

Comment 23 Oliver Henshaw 2016-02-19 16:15:38 UTC
As I observed in bug #1112924 comment #6, python startup is very slow. It takes around 12 seconds to reach stat("/usr/lib/python2.7/site-packages/firewall/config") when "from firewall import config" is the seventh line of /usr/sbin/firewalld.

Comment 24 Fedora Update System 2016-02-21 16:30:14 UTC
firewalld-0.4.0-2.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.

Comment 25 Oliver Henshaw 2016-02-22 14:06:13 UTC
Not fixed according to comment #16.

Comment 26 Artem 2016-02-24 13:21:43 UTC
just confirm one more time, that the bug is NOT fixed

Comment 27 Oliver Henshaw 2016-06-24 17:06:00 UTC
Created attachment 1171998 [details]
'/usr/bin/strace -tt -T' of firewalld service launch during boot (F23 SSD)

So the firewalld statup time improved to about 2.3 seconds after fitting the machine with an SSD. This still seems a little on the high side. Upgrading to F23 shows no further improvement.

Repeating the strace logging shows that firewalld still spends a high proportion of its startup in very early python imports. With 
# ls /etc/systemd/system/firewalld.service.d/
orderring.conf  strace.conf
# cat orderring.conf
[Unit]
Requires=polkit.service abrtd.service
After=polkit.service abrtd.service
Before=tuned.service
# cat strace.conf
[Service]
ExecStart=
ExecStart=/usr/bin/strace -tt -T -o /run/log/firewalld /usr/sbin/firewalld --nofork --nopid $FIREWALLD_ARGS

'systemd-analyze blame' reports "3.653s firewalld.service" and the corresponding strace log is attached.

I see 0.6 seconds between stat("/usr/bin/lib64/python3.4/os.py") and stat("/usr/lib64/python3.4/site-packages/dbus/...") and a further 0.4 seconds until stat("/usr/lib/python3.4/site-packages/firewall/config/...")

So it takes about a second to reach the first firewalld-related import and "import os\n import sys" by itself takes over half a second.

So based on this and comment #23, there's something in python startup that's heavily IO-bound on HDD, and possibly noticeably IO bound or CPU-bound on SSD.

Comment 28 Oliver Henshaw 2016-06-28 14:11:22 UTC
(In reply to Oliver Henshaw from comment #27)
> So based on this and comment #23, there's something in python startup that's
> heavily IO-bound on HDD, and possibly noticeably IO bound or CPU-bound on
> SSD.

Or perhaps it's not so simple. The CPU wait (overall or per-cpu) chart from bootchart is slightly misleading as it comes from /proc/schedstat and shows the total time spent waiting to run by all tasks - i.e. it includes times when another process is running. AIUI the iowait time from /proc/stat would be more useful here, probably the only things to worry about in bootchart are times when CPU utilisation is < 100% and wait-time is > 0%.

So even though firewalld spends a lot of time waiting, other services like Xorg or abrt ancillary services are making progress in this time. If I order firewalld after polkit.service, abrtd.service, abrt-xorg.service, abrt-oops.service, abrt-ccpp.service and before sddm.service then some parts of early firewalld startup do happen much quicker but overall boot time doesn't change much.

So I should relax all the ordering constraints (which I added to isolate and debug the problem) and let firewalld start in parallel with any other services and let the boot-time be as efficient as possible.

A system with a faster CPU might become io-limited though, even with a SSD.

Comment 29 Fedora End Of Life 2016-11-24 14:37:28 UTC
This message is a reminder that Fedora 23 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 23. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '23'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 23 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 30 Fedora End Of Life 2016-12-20 17:30:54 UTC
Fedora 23 changed to end-of-life (EOL) status on 2016-12-20. Fedora 23 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.