Bug 1753677 - High cpu usage while non-controlled interface is mangling tc filters
Summary: High cpu usage while non-controlled interface is mangling tc filters
Keywords:
Status: VERIFIED
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: NetworkManager
Version: 8.2
Hardware: Unspecified
OS: Linux
high
high
Target Milestone: rc
: 8.0
Assignee: Beniamino Galvani
QA Contact: Matej Berezny
URL:
Whiteboard:
: 1785040 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-09-19 14:59 UTC by Marcelo Ricardo Leitner
Modified: 2021-10-11 12:54 UTC (History)
15 users (show)

Fixed In Version: NetworkManager-1.34.0-0.2.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-31 07:30:51 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)
Flamegraph during the insert test (188.65 KB, image/svg+xml)
2019-12-19 19:20 UTC, Marcelo Ricardo Leitner
no flags Details
Same flamegraph, but wider/more readable (954.18 KB, image/svg+xml)
2019-12-19 19:21 UTC, Marcelo Ricardo Leitner
no flags Details
Reproducer (1.88 KB, application/x-shellscript)
2021-09-15 08:10 UTC, Beniamino Galvani
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1753684 0 high CLOSED Excessive memory usage while dealing with a flood of kernel messages 2021-09-17 09:57:39 UTC
freedesktop.org Gitlab NetworkManager/NetworkManager-ci - merge_requests 850 0 None merged tc: tc_device_filter_management test 2021-10-08 11:25:19 UTC
freedesktop.org Gitlab NetworkManager/NetworkManager - merge_requests 981 0 None merged platform: don't listen for tc netlink messages 2021-09-20 11:38:52 UTC

Internal Links: 1753684

Description Marcelo Ricardo Leitner 2019-09-19 14:59:14 UTC
Description of problem:
Test consist of adding 1M tc filter rules, such as:
# tc qdisc add dev p6p1 ingress
# tc -b add.0
on which:
[root@wsfd-netdev19 ~]# head add.0
filter add dev p6p1 protocol ip ingress prio 1 handle 1 flower skip_hw src_mac e4:11:0:0:0:0 dst_mac e4:12:0:0:0:0  action gact drop
filter add dev p6p1 protocol ip ingress prio 1 handle 2 flower skip_hw src_mac e4:11:0:0:0:1 dst_mac e4:12:0:0:0:1  action gact drop

even with:
[root@wsfd-netdev19 ~]# cat /etc/sysconfig/network-scripts/ifcfg-p6p1
NM_CONTROLLED=no

I see:
   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 10684 root      20   0    9888    700    584 R  99.3  0.0   0:22.06 tc
  1077 root      20   0  755728 292112   6692 S  53.6  0.3   1:41.64 NetworkManager
                                                 ^^^^ excess usage

It keeps using CPU till the tc -b command ends.
It was consuming ~90% actually up to when this kernel patch was applied: https://bugzilla.redhat.com/show_bug.cgi?id=1744829
Point being, the % itself may be flaky but NM is actively working.

I didn't try on RHEL8/Fedora yet.

Version-Release number of selected component (if applicable):
NetworkManager-1.18.0-6.el7.x86_64

Comment 2 Marcelo Ricardo Leitner 2019-12-19 19:20:13 UTC
Created attachment 1646656 [details]
Flamegraph during the insert test

Here we can use up to 6 CPUs to process it. Yes, NM is not a big part of it, yet it's arguable useless work being done. Remember that each message ignored by NM userspace is at least a skb cloned in kernel.

Comment 3 Marcelo Ricardo Leitner 2019-12-19 19:21:09 UTC
Created attachment 1646657 [details]
Same flamegraph, but wider/more readable

Comment 4 Marcelo Ricardo Leitner 2019-12-19 19:24:02 UTC
Attachments from comment #2 and #3 are for tests with OvS, as described in https://bugzilla.redhat.com/show_bug.cgi?id=1785040#c0

Sorry, I had forgot this bz and commented on this one thinking it was the new one.
Will fix it now closing the new one as dupe of this.

Comment 5 Marcelo Ricardo Leitner 2019-12-19 19:24:55 UTC
*** Bug 1785040 has been marked as a duplicate of this bug. ***

Comment 6 Marcelo Ricardo Leitner 2019-12-19 19:30:13 UTC
One idea to fix this is, from OvS side, is to create a new flag for when adding new filters, one that silence broadcasting such event.
AFAIK OvS doesn't rely on the event (even though it probably should).
Although IMHO the probability of this being rejected upstream is quite high, because it's like saying "hey please skip this packet from tcpdumps, just sneak it through!".

Comment 7 Marcelo Ricardo Leitner 2019-12-20 11:51:18 UTC
(In reply to Marcelo Ricardo Leitner from comment #6)
> One idea to fix this is, from OvS side, is to create a new flag for when
> adding new filters, one that silence broadcasting such event.
> AFAIK OvS doesn't rely on the event (even though it probably should).

Actually it does rely on it, and the broadcast would need to be turned into a unicast only.
ovs' parse_netlink_to_tc_flower() parses it entirely.

Comment 8 Thomas Haller 2019-12-20 16:17:45 UTC
> One idea to fix this is, from OvS side, is to create a new flag for when adding new filters, one that silence broadcasting such event.

if these filters are otherwise obtainable via RTM_GETTFILTER (`tc filter show`), than that seems very wrong.

The purpose of the netlink notifications it to notify about the content of the configured "objects", so not to have to fetch them all. Not emitting events defeats the purpose of why they exist.

Comment 9 Marcelo Ricardo Leitner 2019-12-20 17:27:42 UTC
(In reply to Thomas Haller from comment #8)
> > One idea to fix this is, from OvS side, is to create a new flag for when adding new filters, one that silence broadcasting such event.
> 
> if these filters are otherwise obtainable via RTM_GETTFILTER (`tc filter
> show`), than that seems very wrong.

They are.

> 
> The purpose of the netlink notifications it to notify about the content of
> the configured "objects", so not to have to fetch them all. Not emitting
> events defeats the purpose of why they exist.

Agreed. It at very least would hinder debug-ability.

Comment 10 Davide Caratti 2020-06-09 14:48:24 UTC
(In reply to Marcelo Ricardo Leitner from comment #3)
> Created attachment 1646657 [details]
> Same flamegraph, but wider/more readable

it's strange that tcf_pedit_dump() takes many more cycles compared
to other TC actions. Probably this is something that can improve
reducing the lock contention with the traffic plane (e.g. with the
RCU-ification)? that would improve the flame graphs - and reduce
CPU usage by some percent points. This would also allow skipping
per-cpu counters allocation (and dump) [1], thus obtaining another
(small, but maybe visible) speedup.

do you think it's worth a try?
-- 
davide

[1] https://lore.kernel.org/netdev/20191022141804.27639-1-vladbu@mellanox.com/

Comment 11 Marcelo Ricardo Leitner 2020-06-09 15:18:29 UTC
(In reply to Davide Caratti from comment #10)
> do you think it's worth a try?

Yes! For both PoV, memory and CPU savings (the latter which is two-fold, on per-cpu processing and avoiding spinlock contention). :)

Comment 17 Gris Ge 2021-03-02 06:11:03 UTC
Hi Marcelo,

Is there any action plan for this bug?

Comment 18 Marcelo Ricardo Leitner 2021-03-02 15:07:28 UTC
Hi Gris. I don't know. I'm not that involved with NM development. Maybe Antonio knows better.

Comment 21 Gris Ge 2021-03-24 04:20:11 UTC
Hi Thomas,

Will this bug been solved as side effect of https://bugzilla.redhat.com/show_bug.cgi?id=1847125 ?

Comment 22 RHEL Program Management 2021-03-31 07:30:51 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 24 Marcelo Ricardo Leitner 2021-07-09 19:32:55 UTC
Can we make this bug public?

Comment 25 Till Maas 2021-07-09 19:55:45 UTC
(In reply to Marcelo Ricardo Leitner from comment #24)
> Can we make this bug public?

It seems to me that you made it private. I don't see a reason why it should be private.

Comment 26 Marcelo Ricardo Leitner 2021-07-09 20:46:41 UTC
Oh! Lets flip it, then. Thanks.

Comment 28 Beniamino Galvani 2021-09-15 08:10:59 UTC
Created attachment 1823201 [details]
Reproducer

Reproducer script to check NM CPU usage with many tc filters.

Result with current git main branch of NM:

    # ./rh1753677.sh
    * Setting up interface...
    * Inserting filters...
    * Waiting for NM to settle...

    NM execution time:
    User:    11760 ms
    Kernel:   3800 ms
    Total:   15560 ms

With branch `bg/tc-no-cache`:

    # ./rh1753677.sh
    * Setting up interface...
    * Inserting filters...
    * Waiting for NM to settle...

    NM execution time:
    User:        0 ms
    Kernel:      0 ms
    Total:       0 ms


Note You need to log in before you can comment on or make changes to this bug.