Bug 1753677
Summary: | High cpu usage while non-controlled interface is mangling tc filters | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Marcelo Ricardo Leitner <mleitner> | ||||||||
Component: | NetworkManager | Assignee: | Beniamino Galvani <bgalvani> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Matej Berezny <mberezny> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | high | ||||||||||
Version: | 8.2 | CC: | acardace, bgalvani, dcaratti, fbaudin, fge, fgiudici, lmiksik, lrintel, mberezny, rkhan, sukulkar, thaller, till, vbenes | ||||||||
Target Milestone: | rc | Keywords: | Reopened, Triaged | ||||||||
Target Release: | 8.0 | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | NetworkManager-1.34.0-0.2.el8 | Doc Type: | No Doc Update | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2022-05-10 14:54:06 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Marcelo Ricardo Leitner
2019-09-19 14:59:14 UTC
Created attachment 1646656 [details]
Flamegraph during the insert test
Here we can use up to 6 CPUs to process it. Yes, NM is not a big part of it, yet it's arguable useless work being done. Remember that each message ignored by NM userspace is at least a skb cloned in kernel.
Created attachment 1646657 [details]
Same flamegraph, but wider/more readable
Attachments from comment #2 and #3 are for tests with OvS, as described in https://bugzilla.redhat.com/show_bug.cgi?id=1785040#c0 Sorry, I had forgot this bz and commented on this one thinking it was the new one. Will fix it now closing the new one as dupe of this. *** Bug 1785040 has been marked as a duplicate of this bug. *** One idea to fix this is, from OvS side, is to create a new flag for when adding new filters, one that silence broadcasting such event. AFAIK OvS doesn't rely on the event (even though it probably should). Although IMHO the probability of this being rejected upstream is quite high, because it's like saying "hey please skip this packet from tcpdumps, just sneak it through!". (In reply to Marcelo Ricardo Leitner from comment #6) > One idea to fix this is, from OvS side, is to create a new flag for when > adding new filters, one that silence broadcasting such event. > AFAIK OvS doesn't rely on the event (even though it probably should). Actually it does rely on it, and the broadcast would need to be turned into a unicast only. ovs' parse_netlink_to_tc_flower() parses it entirely. > One idea to fix this is, from OvS side, is to create a new flag for when adding new filters, one that silence broadcasting such event.
if these filters are otherwise obtainable via RTM_GETTFILTER (`tc filter show`), than that seems very wrong.
The purpose of the netlink notifications it to notify about the content of the configured "objects", so not to have to fetch them all. Not emitting events defeats the purpose of why they exist.
(In reply to Thomas Haller from comment #8) > > One idea to fix this is, from OvS side, is to create a new flag for when adding new filters, one that silence broadcasting such event. > > if these filters are otherwise obtainable via RTM_GETTFILTER (`tc filter > show`), than that seems very wrong. They are. > > The purpose of the netlink notifications it to notify about the content of > the configured "objects", so not to have to fetch them all. Not emitting > events defeats the purpose of why they exist. Agreed. It at very least would hinder debug-ability. (In reply to Marcelo Ricardo Leitner from comment #3) > Created attachment 1646657 [details] > Same flamegraph, but wider/more readable it's strange that tcf_pedit_dump() takes many more cycles compared to other TC actions. Probably this is something that can improve reducing the lock contention with the traffic plane (e.g. with the RCU-ification)? that would improve the flame graphs - and reduce CPU usage by some percent points. This would also allow skipping per-cpu counters allocation (and dump) [1], thus obtaining another (small, but maybe visible) speedup. do you think it's worth a try? -- davide [1] https://lore.kernel.org/netdev/20191022141804.27639-1-vladbu@mellanox.com/ (In reply to Davide Caratti from comment #10) > do you think it's worth a try? Yes! For both PoV, memory and CPU savings (the latter which is two-fold, on per-cpu processing and avoiding spinlock contention). :) Hi Marcelo, Is there any action plan for this bug? Hi Gris. I don't know. I'm not that involved with NM development. Maybe Antonio knows better. Hi Thomas, Will this bug been solved as side effect of https://bugzilla.redhat.com/show_bug.cgi?id=1847125 ? After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. Can we make this bug public? (In reply to Marcelo Ricardo Leitner from comment #24) > Can we make this bug public? It seems to me that you made it private. I don't see a reason why it should be private. Oh! Lets flip it, then. Thanks. Created attachment 1823201 [details]
Reproducer
Reproducer script to check NM CPU usage with many tc filters.
Result with current git main branch of NM:
# ./rh1753677.sh
* Setting up interface...
* Inserting filters...
* Waiting for NM to settle...
NM execution time:
User: 11760 ms
Kernel: 3800 ms
Total: 15560 ms
With branch `bg/tc-no-cache`:
# ./rh1753677.sh
* Setting up interface...
* Inserting filters...
* Waiting for NM to settle...
NM execution time:
User: 0 ms
Kernel: 0 ms
Total: 0 ms
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (NetworkManager bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:1985 |