Bug 1834853 - adding many set elements with --echo is very slow
Summary: adding many set elements with --echo is very slow
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: nftables
Version: 32
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Kevin Fenzi
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-12 14:25 UTC by Richard Shaw
Modified: 2020-06-30 17:51 UTC (History)
5 users (show)

Fixed In Version: nftables-0.9.3-3.fc32
Clone Of:
: 1835315 (view as bug list)
Environment:
Last Closed: 2020-05-19 02:52:39 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
json blob used in comment 4 (11.95 MB, text/plain)
2020-06-30 17:51 UTC, Eric Garver
no flags Details

Description Richard Shaw 2020-05-12 14:25:42 UTC
Description of problem:
After upgrading from Fedora 31 to 32 I noticed that firewalld was hanging using 100% CPU of one core. There was no useful output in the logs due to the stdout/stderr being redirected to null (which is a separate issue).

After running manually with --debug I noticed it was hanging while trying to load my blacklist ipset.

I tried removing the blacklist.xml files from /etc/firewalld/ipset and restarting normally worked, but as soon as I tried to reinstall my blacklist ipset firewalld hung again.


Version-Release number of selected component (if applicable):
firewalld-0.8.2-2.fc32.noarch


Additional information:
Blacklist ipsets added using networks from ipdeny for specific countries using the following command:

firewall-cmd -q --permanent --new-ipset=blacklist --type=hash:net \
    --option=family=inet --option=hashsize=4096 --option=maxelem=200000 \
    --set-description="An ipset list of networks or ips to be dropped."

for country in $countries; do
    firewall-cmd -q --permanent --ipset=blacklist \
        --add-entries-from-file=./$country.zone
done

Comment 1 Eric Garver 2020-05-12 19:06:35 UTC
I was able to reproduce this. Attaching strace while CPU utilization is high shows lots of netlink activity (i.e. communicating with the kernel/nftables via libnftbales). CPU utilization should drop once the set is finished applying to nftables. For "cn" zone it takes ~22 seconds on a modest VM.

Of course if you have IndividualCalls=yes in firewalld.conf it's much, much worse. But that's a non-default setting and only meant for debug.

Comment 2 Richard Shaw 2020-05-12 19:13:01 UTC
My CPU utilization never seems to complete (waited at least an hour) and both the firewalld-config GUI and firewall-cmd hang without being able to communicate with firewalld. I'm currently blacklisting 22 countries.

Comment 3 Eric Garver 2020-05-13 12:14:35 UTC
This is still being investigated. In the meantime a workaround is to revert to the iptables backend by setting FirewalldBackend=iptables in firewalld.conf.

Comment 4 Eric Garver 2020-05-13 14:13:40 UTC
Any chance you could try the below patch? It disables echo support when restoring sets. This seems to be the cause of the slow set apply. It can be reproduced outside of firewalld.

# nft flush ruleset; time nft  --handle -j -f /tmp/json.nft
real    0m0.496s

# nft flush ruleset; time nft -e  --handle -j -f /tmp/json.nft > /dev/null
real    0m20.356s


--->8---


# cd /usr/lib/python3.8/site-packages/firewall
# patch -p0 < /tmp/patch
patching file core/nftables.py

/tmp/patch:

--- core/nftables.py            10:05:28.159696921 -0400
+++ core/nftables.py.new        2020-05-13 10:06:04.622696921 -0400
@@ -1756,8 +1756,12 @@
     def set_restore(self, set_name, type_name, entries,
                     create_options=None, entry_options=None):
         rules = []
+        self.nftables.set_echo_output(False)
+        self.nftables.set_handle_output(False)
         rules.extend(self.build_set_create_rules(set_name, type_name, create_options))
         rules.extend(self.build_set_flush_rules(set_name))
         for entry in entries:
             rules.extend(self.build_set_add_rules(set_name, entry))
         self.set_rules(rules, self._fw.get_log_denied())
+        self.nftables.set_echo_output(True)
+        self.nftables.set_handle_output(True)

Comment 5 Eric Garver 2020-05-13 14:18:31 UTC
Don't forget to _fully_ restart firewalld to get the code change:

[root@vmhost-fedora-test1 firewall]# time firewall-cmd --reload
success

real    0m22.153s
user    0m0.190s
sys     0m0.036s

[root@vmhost-fedora-test1 firewall]# systemctl restart firewalld
[root@vmhost-fedora-test1 firewall]# time firewall-cmd --reload
success

real    0m1.774s
user    0m0.199s
sys     0m0.032s

Comment 6 Eric Garver 2020-05-13 15:03:29 UTC
This has been addressed by nftables upstream patch. My testing shows it makes at least a 10x improvement. As such, reassigning to nftables.

--

# time firewall-cmd --reload                        
success                                                                               
real    0m20.585s

# git am /tmp/mbox                                  
Applying: JSON: Improve performance of json_events_cb()
# make install

# systemctl restart firewalld
# time firewall-cmd --reload                                                                          
success
real    0m2.643s

Comment 7 Richard Shaw 2020-05-13 17:20:04 UTC
Yes I believe this fixed it.

I could run my blacklist script which now completes instead of hanging. Even after it completes firewalld uses about 70-90% of a core for approximately 3 minutes but firewall-cmd worked instead of hanging so that's good.

Is the delay due to translating the ipsets to nftables? If so, is there a native nftables method of doing the same thing?

Comment 8 Eric Garver 2020-05-14 11:42:01 UTC
Most of the delay was caused by an issue in libnftables. The above firewalld patch works around the issue.

It's fixed by this commit:

  c96c7da272e3 ("JSON: Improve performance of json_events_cb()")

Comment 9 Richard Shaw 2020-05-15 12:39:44 UTC
Normally I would wait longer before nagging a maintainer but I think this is fairly serious and needs attention.

Comment 10 Richard Shaw 2020-05-15 12:51:54 UTC
@Eric

I got the firewalld update today which necessitated reapplication of the patch, however, it doesn't seem to work correctly anymore. I'm back to 100% CPU on one core for over 5 minutes now.

Comment 11 Kevin Fenzi 2020-05-15 15:25:35 UTC
I'm aware of the bug. ;) I will try and find time to push out a patched version... if someone else wants to do that or submit a tested PR, please do.

Comment 12 Richard Shaw 2020-05-15 17:54:01 UTC
Submitted.

https://src.fedoraproject.org/rpms/nftables/pull-request/2

Comment 13 Fedora Update System 2020-05-15 19:58:26 UTC
FEDORA-2020-85a1a1b406 has been submitted as an update to Fedora 32. https://bodhi.fedoraproject.org/updates/FEDORA-2020-85a1a1b406

Comment 14 Fedora Update System 2020-05-16 05:07:12 UTC
FEDORA-2020-85a1a1b406 has been pushed to the Fedora 32 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-85a1a1b406`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-85a1a1b406

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 15 Fedora Update System 2020-05-19 02:52:39 UTC
FEDORA-2020-85a1a1b406 has been pushed to the Fedora 32 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 16 Eric Garver 2020-06-30 17:51:48 UTC
Created attachment 1699374 [details]
json blob used in comment 4


Note You need to log in before you can comment on or make changes to this bug.