Bug 1835315

Summary: adding many set elements with --echo is very slow
Product: Red Hat Enterprise Linux 8 Reporter: Eric Garver <egarver>
Component: nftablesAssignee: Phil Sutter <psutter>
Status: CLOSED DUPLICATE QA Contact: qe-baseos-daemons
Severity: medium Docs Contact:
Priority: unspecified    
Version: 8.3CC: berend.de.schouwer, egarver, extras-qa, hobbes1069, kevin, psutter, samuel-rhbugs, todoleza
Target Milestone: rc   
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1834853 Environment:
Last Closed: 2020-05-13 15:23:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
json blob to trigger slow adding of elements none

Description Eric Garver 2020-05-13 15:17:42 UTC
+++ This bug was initially created as a clone of Bug #1834853 +++

Description of problem:
After upgrading from Fedora 31 to 32 I noticed that firewalld was hanging using 100% CPU of one core. There was no useful output in the logs due to the stdout/stderr being redirected to null (which is a separate issue).

After running manually with --debug I noticed it was hanging while trying to load my blacklist ipset.

I tried removing the blacklist.xml files from /etc/firewalld/ipset and restarting normally worked, but as soon as I tried to reinstall my blacklist ipset firewalld hung again.


Version-Release number of selected component (if applicable):
firewalld-0.8.2-2.fc32.noarch


Additional information:
Blacklist ipsets added using networks from ipdeny for specific countries using the following command:

firewall-cmd -q --permanent --new-ipset=blacklist --type=hash:net \
    --option=family=inet --option=hashsize=4096 --option=maxelem=200000 \
    --set-description="An ipset list of networks or ips to be dropped."

for country in $countries; do
    firewall-cmd -q --permanent --ipset=blacklist \
        --add-entries-from-file=./$country.zone
done

--- Additional comment from Eric Garver on 2020-05-12 19:06:35 UTC ---

I was able to reproduce this. Attaching strace while CPU utilization is high shows lots of netlink activity (i.e. communicating with the kernel/nftables via libnftbales). CPU utilization should drop once the set is finished applying to nftables. For "cn" zone it takes ~22 seconds on a modest VM.

Of course if you have IndividualCalls=yes in firewalld.conf it's much, much worse. But that's a non-default setting and only meant for debug.

--- Additional comment from Richard Shaw on 2020-05-12 19:13:01 UTC ---

My CPU utilization never seems to complete (waited at least an hour) and both the firewalld-config GUI and firewall-cmd hang without being able to communicate with firewalld. I'm currently blacklisting 22 countries.

--- Additional comment from Eric Garver on 2020-05-13 12:14:35 UTC ---

This is still being investigated. In the meantime a workaround is to revert to the iptables backend by setting FirewalldBackend=iptables in firewalld.conf.

--- Additional comment from Eric Garver on 2020-05-13 14:13:40 UTC ---

Any chance you could try the below patch? It disables echo support when restoring sets. This seems to be the cause of the slow set apply. It can be reproduced outside of firewalld.

# nft flush ruleset; time nft  --handle -j -f /tmp/json.nft
real    0m0.496s

# nft flush ruleset; time nft -e  --handle -j -f /tmp/json.nft > /dev/null
real    0m20.356s


--->8---


# cd /usr/lib/python3.8/site-packages/firewall
# patch -p0 < /tmp/patch
patching file core/nftables.py

/tmp/patch:

--- core/nftables.py            10:05:28.159696921 -0400
+++ core/nftables.py.new        2020-05-13 10:06:04.622696921 -0400
@@ -1756,8 +1756,12 @@
     def set_restore(self, set_name, type_name, entries,
                     create_options=None, entry_options=None):
         rules = []
+        self.nftables.set_echo_output(False)
+        self.nftables.set_handle_output(False)
         rules.extend(self.build_set_create_rules(set_name, type_name, create_options))
         rules.extend(self.build_set_flush_rules(set_name))
         for entry in entries:
             rules.extend(self.build_set_add_rules(set_name, entry))
         self.set_rules(rules, self._fw.get_log_denied())
+        self.nftables.set_echo_output(True)
+        self.nftables.set_handle_output(True)

--- Additional comment from Eric Garver on 2020-05-13 14:18:31 UTC ---

Don't forget to _fully_ restart firewalld to get the code change:

[root@vmhost-fedora-test1 firewall]# time firewall-cmd --reload
success

real    0m22.153s
user    0m0.190s
sys     0m0.036s

[root@vmhost-fedora-test1 firewall]# systemctl restart firewalld
[root@vmhost-fedora-test1 firewall]# time firewall-cmd --reload
success

real    0m1.774s
user    0m0.199s
sys     0m0.032s

--- Additional comment from Eric Garver on 2020-05-13 15:03:29 UTC ---

This has been addressed by nftables upstream patch. My testing shows it makes at least a 10x improvement. As such, reassigning to nftables.

--

# time firewall-cmd --reload                        
success                                                                               
real    0m20.585s

# git am /tmp/mbox                                  
Applying: JSON: Improve performance of json_events_cb()
# make install

# systemctl restart firewalld
# time firewall-cmd --reload                                                                          
success
real    0m2.643s

Comment 1 Eric Garver 2020-05-13 15:22:43 UTC
Created attachment 1688094 [details]
json blob to trigger slow adding of elements

Comment 2 Phil Sutter 2020-05-13 15:23:39 UTC

*** This bug has been marked as a duplicate of bug 1835300 ***