Bug 1921624
Summary: | 'nft -f' takes about triple times to restore nftables rules | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Li Shuang <shuali> |
Component: | kernel | Assignee: | Richard Guy Briggs <rbriggs> |
kernel sub component: | Netfilter | QA Contact: | Li Shuang <shuali> |
Status: | CLOSED CURRENTRELEASE | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | egarver, jiji, network-qe, rbriggs |
Version: | 8.5 | Keywords: | Regression, Triaged |
Target Milestone: | rc | ||
Target Release: | 8.4 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-06-28 12:30:06 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Li Shuang
2021-01-28 10:25:39 UTC
Created attachment 1751634 [details]
1.rules
Created attachment 1751635 [details]
2.rules
'nft -f 2.rules' takes about triple times on kernel-4.18.0-277.el8 compared with kernel-4.18.0-240.el8. When reproducing this, I got even much worse results and it looks like recently added audit logging is killing performance: [root@wsfd-netdev13 nft_performance]# nft flush ruleset [root@wsfd-netdev13 nft_performance]# nft -f 1.rules [root@wsfd-netdev13 nft_performance]# auditctl -e0 enabled 0 failure 1 pid 1126 rate_limit 0 backlog_limit 8192 lost 0 backlog 3 backlog_wait_time 60000 [root@wsfd-netdev13 nft_performance]# time nft -f 2.rules real 0m0.538s user 0m0.407s sys 0m0.126s [root@wsfd-netdev13 nft_performance]# nft flush ruleset [root@wsfd-netdev13 nft_performance]# nft -f 1.rules [root@wsfd-netdev13 nft_performance]# auditctl -e1 enabled 1 failure 1 pid 1126 rate_limit 0 backlog_limit 8192 lost 0 backlog 0 backlog_wait_time 60000 [root@wsfd-netdev13 nft_performance]# time nft -f 2.rules real 0m23.890s user 0m0.397s sys 0m0.238s IIUC, the kernel emits an audit log for each and every one of the 70k elements being added. What's worse, they all look identical, so there's not even a benefit in doing that: type=NETFILTER_CFG msg=audit(1611927321.092:155): table=filter:13;xtest:2 family=7 entries=2 op=nft_register_setelem pid=2844 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 comm="nft" type=NETFILTER_CFG msg=audit(1611927321.092:155): table=filter:13;xtest:2 family=7 entries=2 op=nft_register_setelem pid=2844 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 comm="nft" type=NETFILTER_CFG msg=audit(1611927321.092:155): table=filter:13;xtest:2 family=7 entries=2 op=nft_register_setelem pid=2844 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 comm="nft" Li Shuang, I wonder why your numbers are much better. Did you run the tests with auditd disabled? Richard, can we reduce the logging (for elements at least) to once per commit instead of once per element? (In reply to Phil Sutter from comment #4) > When reproducing this, I got even much worse results and it looks like recently added audit logging is killing performance: Yes, this was added to address the upstream issue: https://github.com/linux-audit/audit-kernel/issues/124 > IIUC, the kernel emits an audit log for each and every one of the 70k elements being added. What's worse, they all look identical, so there's not even a benefit in doing that: This is similar to a recent fedora issue, and your suggestion will help there too. https://bugzilla.redhat.com/show_bug.cgi?id=1918013 The records aren't identical if we add the network namspace identifiers (dev/inode). I already have a patch for that, but is waiting for other work to go upstream first: https://github.com/linux-audit/audit-kernel/issues/79 "RFE: add net namespace ID to NETFILTER_CFG records" > Richard, can we reduce the logging (for elements at least) to once per commit instead of once per element? I sought that kind of feedback when the original patchset was posted upstream. Can we work on this together to keep the essentials while ditching the noise? Hi Phil, I used the default auditd configure to do the above tests, so I think it's auditd serivce started status with default configure. I also rerun the tests on both kernel 240 and 279 with auditd disabled/enabled status, more details are as follows: +----------------+----------+----------+-------------------------------------------------------------+ | | auditd | auditctl | 2.rules | +----------------+----------+----------+------+----------+----------+----------+----------+----------+ | | | | real | 0m0.394s | 0m0.395s | 0m0.413s | 0m0.389s | 0m0.417s | | | + +------+----------+----------+----------+----------+----------+ | | disabled | - | user | 0m0.299s | 0m0.301s | 0m0.314s | 0m0.291s | 0m0.311s | | | + +------+----------+----------+----------+----------+----------+ | | | | sys | 0m0.089s | 0m0.093s | 0m0.097s | 0m0.097s | 0m0.102s | | 4.18.0-240.el8 +----------+----------+------+----------+----------+----------+----------+----------+ | | | | real | 0m0.397s | 0m0.396s | 0m0.399s | 0m0.395s | 0m0.401s | | | + +------+----------+----------+----------+----------+----------+ | | enabled | - | user | 0m0.298s | 0m0.298s | 0m0.304s | 0m0.296s | 0m0.299s | | | + +------+----------+----------+----------+----------+----------+ | | | | sys | 0m0.093s | 0m0.096s | 0m0.093s | 0m0.096s | 0m0.102s | +----------------+----------+----------+------+----------+----------+----------+----------+----------+ | | | | real | 0m0.421s | 0m0.439s | 0m0.418s | 0m0.432s | 0m0.430s | | | + +------+----------+----------+----------+----------+----------+ | | disabled | - | user | 0m0.306s | 0m0.303s | 0m0.305s | 0m0.304s | 0m0.304s | | | + +------+----------+----------+----------+----------+----------+ | | | | sys | 0m0.110s | 0m0.134s | 0m0.112s | 0m0.126s | 0m0.124s | | +----------+----------+------+----------+----------+----------+----------+----------+ | | | | real | 0m1.093s | 0m1.075s | 0m1.137s | 0m1.127s | 0m1.128s | | | + +------+----------+----------+----------+----------+----------+ | | | - | user | 0m0.306s | 0m0.305s | 0m0.299s | 0m0.311s | 0m0.302s | | | + +------+----------+----------+----------+----------+----------+ | | | | sys | 0m0.258s | 0m0.252s | 0m0.285s | 0m0.260s | 0m0.247s | + 4.18.0-279.el8 + +----------+------+----------+----------+----------+----------+----------+ | | | | real | 0m0.447s | 0m0.433s | 0m0.431s | 0m0.425s | 0m0.427s | | | + +------+----------+----------+----------+----------+----------+ | | enabled | e0 | user | 0m0.306s | 0m0.304s | 0m0.309s | 0m0.305s | 0m0.307s | | | + +------+----------+----------+----------+----------+----------+ | | | | sys | 0m0.139s | 0m0.127s | 0m0.120s | 0m0.118s | 0m0.118s | | + +----------+------+----------+----------+----------+----------+----------+ | | | | real | 0m1.215s | 0m1.123s | 0m1.207s | 0m1.220s | 0m1.199s | | | + +------+----------+----------+----------+----------+----------+ | | | e1 | user | 0m0.303s | 0m0.285s | 0m0.315s | 0m0.310s | 0m0.297s | | | + +------+----------+----------+----------+----------+----------+ | | | | sys | 0m0.246s | 0m0.259s | 0m0.220s | 0m0.233s | 0m0.236s | +----------------+----------+----------+------+----------+----------+----------+----------+----------+ I'm not sure why the results are better in my tests, and I can provide my environment as follows: dell-per740-09.rhts.eng.pek2.redhat.com root/redhat Hope this helps. Thanks, Shuang Hi, (In reply to Richard Guy Briggs from comment #5) [...] > > Richard, can we reduce the logging (for elements at least) to once per commit instead of once per element? > > I sought that kind of feedback when the original patchset was posted > upstream. Can we work on this together to keep the essentials while > ditching the noise? Richard, yes, I have this on my TODO for this week. Will get back to you later. Li Shuang, thanks for verifying. It is still unclear why your bad results are so much less bad than mine, but I guess we have found the culprit as your results show the relation to auditd as well. Thanks, Phil Created attachment 1754558 [details]
iptables.rules
Hi Phil,
I did "iptables -F; time iptables-restore < iptables.rules;" test on kernel-4.18.0-280.el8.x86_64 and it seems that iptables-restore also spend about double time to add the same number of iptables stateless rules when audit is enabled:
+----------------+----------+-------------------------------------------------------------+
| | auditctl | iptables.rules |
+----------------+----------+------+----------+----------+----------+----------+----------+
| | | real | 0m2.467s | 0m2.313s | 0m2.385s | 0m2.455s | 0m2.477s |
| | +------+----------+----------+----------+----------+----------+
| | e0 | user | 0m1.212s | 0m1.159s | 0m1.188s | 0m1.187s | 0m1.215s |
| | +------+----------+----------+----------+----------+----------+
| | | sys | 0m1.242s | 0m1.141s | 0m1.183s | 0m1.256s | 0m1.246s |
| 4.18.0-280.el8 +----------+------+----------+----------+----------+----------+----------+
| | | real | 0m4.078s | 0m3.867s | 0m3.848s | 0m3.922s | 0m3.880s |
| | +------+----------+----------+----------+----------+----------+
| | e1 | user | 0m1.270s | 0m1.171s | 0m1.172s | 0m1.234s | 0m1.210s |
| | +------+----------+----------+----------+----------+----------+
| | | sys | 0m1.543s | 0m1.420s | 0m1.428s | 0m1.497s | 0m1.479s |
+----------------+----------+------+----------+----------+----------+----------+----------+
Do you think the bug need a RHEL-8.4.0 blocker?
Thanks,
Shuang
Hi Li Shuang, (In reply to Li Shuang from comment #8) > I did "iptables -F; time iptables-restore < iptables.rules;" test on > kernel-4.18.0-280.el8.x86_64 and it seems that iptables-restore also spend > about double time to add the same number of iptables stateless rules when > audit is enabled: Thanks a lot for doing these tests! I wonder if there's a functional difference to legacy iptables which presumably has the auditd calls as well. Maybe there's just a single log per commit instead of one per rule? You do see one line of audit log for each rule applied, right? > Do you think the bug need a RHEL-8.4.0 blocker? I am not sure. On one hand it is clearly a performance regression. On the other, not having audit logging technically qualifies as regression from legacy iptables. Also, I assume there's a business case for having this. Cheers, Phil (In reply to Phil Sutter from comment #9) > I wonder if there's a functional difference to legacy iptables which presumably has the auditd calls as well. Maybe there's just a single log per commit instead of one per rule? You do see one line of audit log for each rule applied, right? Legacy iptables has a single audit event per table change. I was hoping for the same when auditting nft, but it wasn't so simple. > > Do you think the bug need a RHEL-8.4.0 blocker? > > I am not sure. On one hand it is clearly a performance regression. On the other, not having audit logging technically qualifies as regression from legacy iptables. Also, I assume there's a business case for having this. Yes, security certification. Created attachment 1755060 [details]
audit.tar.gz
Hi Phil,
I run "iptables -F; time iptables-restore < iptables.rules;" again with "auditctl -e1" status and found that all log files under /var/log/audit folder have been rewritten. I attached the log files as audit.tar.gz.
It seems that a lot of entries as follows are added in /var/log/audit/audit.log during iptables rules restoring:
type=NETFILTER_CFG msg=audit(1612442571.873:119): table=filter:6;FORWARD:2 family=2 entries=35545 op=nft_unregister_rule pid=2461 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 comm="iptables"
type=NETFILTER_CFG msg=audit(1612442571.873:119): table=filter:6;FORWARD:2 family=2 entries=35546 op=nft_unregister_rule pid=2461 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 comm="iptables"
type=NETFILTER_CFG msg=audit(1612442571.873:119): table=filter:6;FORWARD:2 family=2 entries=35547 op=nft_unregister_rule pid=2461 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 comm="iptables"
Thanks,
Shuang
2021-03-18 v1 patch posted upstream https://listman.redhat.com/archives/linux-audit/2021-March/msg00109.html 2021-03-22 v2 patch posted upstream https://listman.redhat.com/archives/linux-audit/2021-March/msg00149.html 2021-03-23 v3 patch posted upstream https://listman.redhat.com/archives/linux-audit/2021-March/msg00152.html v4 posted upstream https://listman.redhat.com/archives/linux-audit/2021-March/msg00157.html v5 posted upstream https://listman.redhat.com/archives/linux-audit/2021-March/msg00159.html 2021-03-31 v5 merged upstream in nf-next bb4052e57b5b ("audit: log nftables configuration change events once per table") 2021-03-31 FW found UAF: nf-next fix e4d272948d25 <pablo> ("netfilter: nf_tables: use-after-free") 2021-03-31 v5 merged upstream in nf-next forced update c520292f29b8 ("audit: log nftables configuration change events once per table") 2021-04-02 07:45 Dan Carpenter dadf33c9f6b5 ("netfilter: nftables: fix a warning message in nf_tables_commit_audit_collect()") resolved by bz1957052, in rhel8.5 kernel-4.18.0-323.el8, also in rhel8.4.z |