Bug 1497759

Summary: firewall-cmd slows down ifup/ifdown
Product: Red Hat Enterprise Linux 7 Reporter: Petr Horáček <phoracek>
Component: initscriptsAssignee: David Kaspar // Dee'Kej <deekej>
Status: CLOSED ERRATA QA Contact: Daniel Rusek <drusek>
Severity: high Docs Contact:
Priority: high    
Version: 7.3CC: blc, cww, danken, deekej, drusek, egarver, initscripts-maint-list, lpol, mkalinin, spower, todoleza, twoerner
Target Milestone: rcKeywords: Patch
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: initscripts-9.49.40-2.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-10 18:24:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1588456    
Bug Blocks: 1193083    
Attachments:
Description Flags
replace firewalld-cmd with send-dbus none

Description Petr Horáček 2017-10-02 15:29:29 UTC
Created attachment 1333269 [details]
replace firewalld-cmd with send-dbus

Description of problem:
firewall-cmd calls in ifup/ifdown scripts are way more expensive than the rest of commands there.

ifup of a simple bridge (time in seconds, line number, script, line):
[0.1433262825012207, '314', 'ifup-eth', '/usr/bin/firewall-cmd --zone= --change-interface=net1']
[0.13783884048461914, '111', 'ifup-post', '/usr/bin/firewall-cmd --zone= --change-interface=net1']
[0.028711795806884766, '71', 'ifup', '. /etc/profile.d/lang.sh']
[0.0072748661041259766, '71', 'ifup-ipv6', '. /etc/profile.d/lang.sh']
[0.007008552551269531, '71', 'ifup-aliases', '. /etc/profile.d/lang.sh']
[0.006583452224731445, '71', 'ifup-post', '. /etc/profile.d/lang.sh']
[0.00649714469909668, '71', 'ifup-eth', '. /etc/profile.d/lang.sh']
[0.00396418571472168, '14', 'ifup', 'export PATH']
[0.0036084651947021484, '13', 'ifup', 'PATH=/sbin:/usr/sbin:/bin:/usr/bin']

ifdown:
[0.13937950134277344, '55', 'ifdown-post', '/usr/bin/firewall-cmd --remove-interface=net1']
[0.009960174560546875, '71', 'ifdown', '. /etc/profile.d/lang.sh']
[0.007417440414428711, '90', 'ifdown-eth', 'pidof -x dhclient']
[0.007174968719482422, '71', 'ifdown-ipv6', '. /etc/profile.d/lang.sh']
[0.006841897964477539, '71', 'ifdown-post', '. /etc/profile.d/lang.sh']
[0.006600856781005859, '71', 'ifdown-eth', '. /etc/profile.d/lang.sh']
[0.0027823448181152344, '188', 'ifdown-post', 'dbus-send --system --print-reply --dest=org.freedesktop.NetworkManager /org/freedesktop/NetworkManager/Settings org.freedesktop.NetworkManager.Settings.LoadConnections array:string:/etc/sysconfig/network-scripts/ifcfg-net1']

The first problem is, that `firewall-cmd --zone= --change-interface=net1` is called twice during ifup. Once in ifup-eth and once in ifup-post, although ifup-post is always executed after ifup-eth.

Another problem is slowness of firewall-cmd itself. If we replace the ifup firewall-cmd call with dbus-send, it is 16 times faster, for ifdown it is 40 times faster.

Patches fixing those problems are in the attachment.
 

Version-Release number of selected component (if applicable):
CentOS Linux release 7.4.1708 (Core)
initscripts-9.49.39-1.el7.x86_64


How reproducible:
Always


Actual results:
Most time of ifup/ifdown is spent on firewall-cmd calls.


Expected results:
Time spent on firewall configuration should be reasonable compared to the rest of the setup.

Comment 2 David Kaspar // Dee'Kej 2017-10-02 15:49:56 UTC
Thank you, Petr. The patch needs some cleanup, but the functionality looks good to me...

We were already facing a problem with ifup/ifdown being too slow when calling 'nmcli' directly, and we switched to using D-Bus calls instead. So I'm okay if we fix this the same way to speed things up.

However, I will have to discuss with Lukas why we had the calls to firewall-cmd 2-times. There might have been reason for it, and we don't want to break anything.

Another thing - I might be wrong, but just from top of my head - was there not a plan to replace D-Bus in the future with something else?

Lastly - I see Dan is proposing a RHEL-7.4.z fix. Is there any specific case where this is causing big problems?

Comment 3 Dan Kenigsberg 2017-10-03 07:32:31 UTC
(In reply to David Kaspar [Dee'Kej] from comment #2)
> 
> Lastly - I see Dan is proposing a RHEL-7.4.z fix. Is there any specific case
> where this is causing big problems?

RHV receives multiple customers requesting to attach multiple (tens, hundreds) of networks to a host in one RHV command. RHV's commands timeout after 2 minutes, which limits the number of networks (each with a bridge and a vlan) we can attach. But let us first fix it in 7.5. Only if the fix is as simple as Petr suggests, shall we consider the implications of a backport.

Comment 4 David Kaspar // Dee'Kej 2017-10-03 10:14:39 UTC
(In reply to Dan Kenigsberg from comment #3)
> (In reply to David Kaspar [Dee'Kej] from comment #2)
> > 
> > Lastly - I see Dan is proposing a RHEL-7.4.z fix. Is there any specific case
> > where this is causing big problems?
> 
> RHV receives multiple customers requesting to attach multiple (tens,
> hundreds) of networks to a host in one RHV command. RHV's commands timeout
> after 2 minutes, which limits the number of networks (each with a bridge and
> a vlan) we can attach. But let us first fix it in 7.5. Only if the fix is as
> simple as Petr suggests, shall we consider the implications of a backport.

Oh, I see. That makes IMHO a good validation for 7.5 and 7.4.z.

Adding needinfo on our colleagues for them to provide necessary ACKs if they agree.

Comment 5 David Kaspar // Dee'Kej 2017-10-03 10:29:46 UTC
After discussion with Lukas, we would like to first know why is the firewalld slow in the first place, and could we fix it there?

Comment 6 Eric Garver 2017-10-03 14:23:17 UTC
(In reply to David Kaspar [Dee'Kej] from comment #5)
> After discussion with Lukas, we would like to first know why is the
> firewalld slow in the first place, and could we fix it there?

Short answer: No.

Python is an interpreted language. When the program start it has to go through the python interpreter. There is not much you can do to improve the startup time.

Python does automatically compile the library code to byte code, which helps some. This can also be done for the firewall-cmd executable, but my tests show it only brings a 10-20ms (0.255s --> 0.226s) improvement for "firewall-cmd --list-all".

Comment 7 David Kaspar // Dee'Kej 2017-10-04 11:39:03 UTC
(In reply to Eric Garver from comment #6)
> Python is an interpreted language. When the program start it has to go
> through the python interpreter. There is not much you can do to improve the
> startup time.
> 
> Python does automatically compile the library code to byte code, which helps
> some. This can also be done for the firewall-cmd executable, but my tests
> show it only brings a 10-20ms (0.255s --> 0.226s) improvement for
> "firewall-cmd --list-all".

Thank you for clarification, Eric. I have one more favor to ask you - could you please look at the patches suggested by Petr, and confirm that these D-Bus calls are equivalent to the original firewalld-cmd calls?

Thank you!

Comment 8 Eric Garver 2017-10-04 13:22:54 UTC
(In reply to David Kaspar [Dee'Kej] from comment #7)
> (In reply to Eric Garver from comment #6)
> > Python is an interpreted language. When the program start it has to go
> > through the python interpreter. There is not much you can do to improve the
> > startup time.
> > 
> > Python does automatically compile the library code to byte code, which helps
> > some. This can also be done for the firewall-cmd executable, but my tests
> > show it only brings a 10-20ms (0.255s --> 0.226s) improvement for
> > "firewall-cmd --list-all".
> 
> Thank you for clarification, Eric. I have one more favor to ask you - could
> you please look at the patches suggested by Petr, and confirm that these
> D-Bus calls are equivalent to the original firewalld-cmd calls?
> 
> Thank you!

They are not correct. The removeInterface command is missing the zone argument.

If you care about the return code you should add the --print-reply and redirect stdout to /dev/null. This will make dbus-send block and return an appropriate error code. As it is now, the dbus-send is fire and forget - which also explains why it's so much faster than firewall-cmd.

Correct versions:

  dbus-send --print-reply --system --dest=org.fedoraproject.FirewallD1 \
            /org/fedoraproject/FirewallD1 \
            org.fedoraproject.FirewallD1.zone.changeZoneOfInterface \
            string:"${ZONE}" string:"${DEVICE}" \
            > /dev/null
  # do something with $?

Note: An empty string in first argument implies "default" zone, but it must have both arguments.

  dbus-send --print-reply --system --dest=org.fedoraproject.FirewallD1 \
            /org/fedoraproject/FirewallD1 \
            org.fedoraproject.FirewallD1.zone.removeInterface \
            string: "" string:"${DEVICE}" \
            > /dev/null
  # do something with $?

Comment 9 David Kaspar // Dee'Kej 2017-10-10 15:53:21 UTC
Pull-request submitted for review:
https://github.com/fedora-sysv/initscripts/pull/132

Comment 10 David Kaspar // Dee'Kej 2017-10-16 11:57:17 UTC
Based on the follow-up discussion in the pull-request, I have updated the DBus call in ifup-eth script to be *synchronous* (for IPv6 DHCP setups). The rest of the calls remains asynchronous, for maximal speed increase.

 -- David --

Comment 18 errata-xmlrpc 2018-04-10 18:24:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0983