1835315 – adding many set elements with --echo is very slow

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1835315 - adding many set elements with --echo is very slow

Summary: adding many set elements with --echo is very slow

Keywords:
Status:	CLOSED DUPLICATE of bug 1835300
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	nftables
Sub Component:
Version:	8.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	8.0
Assignee:	Phil Sutter
QA Contact:	qe-baseos-daemons
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-05-13 15:17 UTC by Eric Garver
Modified:	2020-05-13 15:23 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1834853
Environment:
Last Closed:	2020-05-13 15:23:39 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
json blob to trigger slow adding of elements (11.95 MB, text/plain) 2020-05-13 15:22 UTC, Eric Garver	no flags	Details
View All

Description Eric Garver 2020-05-13 15:17:42 UTC

+++ This bug was initially created as a clone of Bug #1834853 +++

Description of problem:
After upgrading from Fedora 31 to 32 I noticed that firewalld was hanging using 100% CPU of one core. There was no useful output in the logs due to the stdout/stderr being redirected to null (which is a separate issue).

After running manually with --debug I noticed it was hanging while trying to load my blacklist ipset.

I tried removing the blacklist.xml files from /etc/firewalld/ipset and restarting normally worked, but as soon as I tried to reinstall my blacklist ipset firewalld hung again.


Version-Release number of selected component (if applicable):
firewalld-0.8.2-2.fc32.noarch


Additional information:
Blacklist ipsets added using networks from ipdeny for specific countries using the following command:

firewall-cmd -q --permanent --new-ipset=blacklist --type=hash:net \
    --option=family=inet --option=hashsize=4096 --option=maxelem=200000 \
    --set-description="An ipset list of networks or ips to be dropped."

for country in $countries; do
    firewall-cmd -q --permanent --ipset=blacklist \
        --add-entries-from-file=./$country.zone
done

--- Additional comment from Eric Garver on 2020-05-12 19:06:35 UTC ---

I was able to reproduce this. Attaching strace while CPU utilization is high shows lots of netlink activity (i.e. communicating with the kernel/nftables via libnftbales). CPU utilization should drop once the set is finished applying to nftables. For "cn" zone it takes ~22 seconds on a modest VM.

Of course if you have IndividualCalls=yes in firewalld.conf it's much, much worse. But that's a non-default setting and only meant for debug.

--- Additional comment from Richard Shaw on 2020-05-12 19:13:01 UTC ---

My CPU utilization never seems to complete (waited at least an hour) and both the firewalld-config GUI and firewall-cmd hang without being able to communicate with firewalld. I'm currently blacklisting 22 countries.

--- Additional comment from Eric Garver on 2020-05-13 12:14:35 UTC ---

This is still being investigated. In the meantime a workaround is to revert to the iptables backend by setting FirewalldBackend=iptables in firewalld.conf.

--- Additional comment from Eric Garver on 2020-05-13 14:13:40 UTC ---

Any chance you could try the below patch? It disables echo support when restoring sets. This seems to be the cause of the slow set apply. It can be reproduced outside of firewalld.

# nft flush ruleset; time nft  --handle -j -f /tmp/json.nft
real    0m0.496s

# nft flush ruleset; time nft -e  --handle -j -f /tmp/json.nft > /dev/null
real    0m20.356s


--->8---


# cd /usr/lib/python3.8/site-packages/firewall
# patch -p0 < /tmp/patch
patching file core/nftables.py

/tmp/patch:

--- core/nftables.py            10:05:28.159696921 -0400
+++ core/nftables.py.new        2020-05-13 10:06:04.622696921 -0400
@@ -1756,8 +1756,12 @@
     def set_restore(self, set_name, type_name, entries,
                     create_options=None, entry_options=None):
         rules = []
+        self.nftables.set_echo_output(False)
+        self.nftables.set_handle_output(False)
         rules.extend(self.build_set_create_rules(set_name, type_name, create_options))
         rules.extend(self.build_set_flush_rules(set_name))
         for entry in entries:
             rules.extend(self.build_set_add_rules(set_name, entry))
         self.set_rules(rules, self._fw.get_log_denied())
+        self.nftables.set_echo_output(True)
+        self.nftables.set_handle_output(True)

--- Additional comment from Eric Garver on 2020-05-13 14:18:31 UTC ---

Don't forget to _fully_ restart firewalld to get the code change:

[root@vmhost-fedora-test1 firewall]# time firewall-cmd --reload
success

real    0m22.153s
user    0m0.190s
sys     0m0.036s

[root@vmhost-fedora-test1 firewall]# systemctl restart firewalld
[root@vmhost-fedora-test1 firewall]# time firewall-cmd --reload
success

real    0m1.774s
user    0m0.199s
sys     0m0.032s

--- Additional comment from Eric Garver on 2020-05-13 15:03:29 UTC ---

This has been addressed by nftables upstream patch. My testing shows it makes at least a 10x improvement. As such, reassigning to nftables.

--

# time firewall-cmd --reload                        
success                                                                               
real    0m20.585s

# git am /tmp/mbox                                  
Applying: JSON: Improve performance of json_events_cb()
# make install

# systemctl restart firewalld
# time firewall-cmd --reload                                                                          
success
real    0m2.643s

Comment 1 Eric Garver 2020-05-13 15:22:43 UTC

Created attachment 1688094 [details]
json blob to trigger slow adding of elements

Comment 2 Phil Sutter 2020-05-13 15:23:39 UTC


*** This bug has been marked as a duplicate of bug 1835300 ***

Note You need to log in before you can comment on or make changes to this bug.