Description of problem: After upgrading from Fedora 31 to 32 I noticed that firewalld was hanging using 100% CPU of one core. There was no useful output in the logs due to the stdout/stderr being redirected to null (which is a separate issue). After running manually with --debug I noticed it was hanging while trying to load my blacklist ipset. I tried removing the blacklist.xml files from /etc/firewalld/ipset and restarting normally worked, but as soon as I tried to reinstall my blacklist ipset firewalld hung again. Version-Release number of selected component (if applicable): firewalld-0.8.2-2.fc32.noarch Additional information: Blacklist ipsets added using networks from ipdeny for specific countries using the following command: firewall-cmd -q --permanent --new-ipset=blacklist --type=hash:net \ --option=family=inet --option=hashsize=4096 --option=maxelem=200000 \ --set-description="An ipset list of networks or ips to be dropped." for country in $countries; do firewall-cmd -q --permanent --ipset=blacklist \ --add-entries-from-file=./$country.zone done
I was able to reproduce this. Attaching strace while CPU utilization is high shows lots of netlink activity (i.e. communicating with the kernel/nftables via libnftbales). CPU utilization should drop once the set is finished applying to nftables. For "cn" zone it takes ~22 seconds on a modest VM. Of course if you have IndividualCalls=yes in firewalld.conf it's much, much worse. But that's a non-default setting and only meant for debug.
My CPU utilization never seems to complete (waited at least an hour) and both the firewalld-config GUI and firewall-cmd hang without being able to communicate with firewalld. I'm currently blacklisting 22 countries.
This is still being investigated. In the meantime a workaround is to revert to the iptables backend by setting FirewalldBackend=iptables in firewalld.conf.
Any chance you could try the below patch? It disables echo support when restoring sets. This seems to be the cause of the slow set apply. It can be reproduced outside of firewalld. # nft flush ruleset; time nft --handle -j -f /tmp/json.nft real 0m0.496s # nft flush ruleset; time nft -e --handle -j -f /tmp/json.nft > /dev/null real 0m20.356s --->8--- # cd /usr/lib/python3.8/site-packages/firewall # patch -p0 < /tmp/patch patching file core/nftables.py /tmp/patch: --- core/nftables.py 10:05:28.159696921 -0400 +++ core/nftables.py.new 2020-05-13 10:06:04.622696921 -0400 @@ -1756,8 +1756,12 @@ def set_restore(self, set_name, type_name, entries, create_options=None, entry_options=None): rules = [] + self.nftables.set_echo_output(False) + self.nftables.set_handle_output(False) rules.extend(self.build_set_create_rules(set_name, type_name, create_options)) rules.extend(self.build_set_flush_rules(set_name)) for entry in entries: rules.extend(self.build_set_add_rules(set_name, entry)) self.set_rules(rules, self._fw.get_log_denied()) + self.nftables.set_echo_output(True) + self.nftables.set_handle_output(True)
Don't forget to _fully_ restart firewalld to get the code change: [root@vmhost-fedora-test1 firewall]# time firewall-cmd --reload success real 0m22.153s user 0m0.190s sys 0m0.036s [root@vmhost-fedora-test1 firewall]# systemctl restart firewalld [root@vmhost-fedora-test1 firewall]# time firewall-cmd --reload success real 0m1.774s user 0m0.199s sys 0m0.032s
This has been addressed by nftables upstream patch. My testing shows it makes at least a 10x improvement. As such, reassigning to nftables. -- # time firewall-cmd --reload success real 0m20.585s # git am /tmp/mbox Applying: JSON: Improve performance of json_events_cb() # make install # systemctl restart firewalld # time firewall-cmd --reload success real 0m2.643s
Yes I believe this fixed it. I could run my blacklist script which now completes instead of hanging. Even after it completes firewalld uses about 70-90% of a core for approximately 3 minutes but firewall-cmd worked instead of hanging so that's good. Is the delay due to translating the ipsets to nftables? If so, is there a native nftables method of doing the same thing?
Most of the delay was caused by an issue in libnftables. The above firewalld patch works around the issue. It's fixed by this commit: c96c7da272e3 ("JSON: Improve performance of json_events_cb()")
Normally I would wait longer before nagging a maintainer but I think this is fairly serious and needs attention.
@Eric I got the firewalld update today which necessitated reapplication of the patch, however, it doesn't seem to work correctly anymore. I'm back to 100% CPU on one core for over 5 minutes now.
I'm aware of the bug. ;) I will try and find time to push out a patched version... if someone else wants to do that or submit a tested PR, please do.
Submitted. https://src.fedoraproject.org/rpms/nftables/pull-request/2
FEDORA-2020-85a1a1b406 has been submitted as an update to Fedora 32. https://bodhi.fedoraproject.org/updates/FEDORA-2020-85a1a1b406
FEDORA-2020-85a1a1b406 has been pushed to the Fedora 32 testing repository. In short time you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-85a1a1b406` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-85a1a1b406 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2020-85a1a1b406 has been pushed to the Fedora 32 stable repository. If problem still persists, please make note of it in this bug report.
Created attachment 1699374 [details] json blob used in comment 4