Bug 1753677

Summary: High cpu usage while non-controlled interface is mangling tc filters
Product: Red Hat Enterprise Linux 8 Reporter: Marcelo Ricardo Leitner <mleitner>
Component: NetworkManagerAssignee: Beniamino Galvani <bgalvani>
Status: CLOSED ERRATA QA Contact: Matej Berezny <mberezny>
Severity: high Docs Contact:
Priority: high    
Version: 8.2CC: acardace, bgalvani, dcaratti, fbaudin, fge, fgiudici, lmiksik, lrintel, mberezny, rkhan, sukulkar, thaller, till, vbenes
Target Milestone: rcKeywords: Reopened, Triaged
Target Release: 8.0   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: NetworkManager-1.34.0-0.2.el8 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-10 14:54:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Flamegraph during the insert test
none
Same flamegraph, but wider/more readable
none
Reproducer none

Description Marcelo Ricardo Leitner 2019-09-19 14:59:14 UTC
Description of problem:
Test consist of adding 1M tc filter rules, such as:
# tc qdisc add dev p6p1 ingress
# tc -b add.0
on which:
[root@wsfd-netdev19 ~]# head add.0
filter add dev p6p1 protocol ip ingress prio 1 handle 1 flower skip_hw src_mac e4:11:0:0:0:0 dst_mac e4:12:0:0:0:0  action gact drop
filter add dev p6p1 protocol ip ingress prio 1 handle 2 flower skip_hw src_mac e4:11:0:0:0:1 dst_mac e4:12:0:0:0:1  action gact drop

even with:
[root@wsfd-netdev19 ~]# cat /etc/sysconfig/network-scripts/ifcfg-p6p1
NM_CONTROLLED=no

I see:
   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 10684 root      20   0    9888    700    584 R  99.3  0.0   0:22.06 tc
  1077 root      20   0  755728 292112   6692 S  53.6  0.3   1:41.64 NetworkManager
                                                 ^^^^ excess usage

It keeps using CPU till the tc -b command ends.
It was consuming ~90% actually up to when this kernel patch was applied: https://bugzilla.redhat.com/show_bug.cgi?id=1744829
Point being, the % itself may be flaky but NM is actively working.

I didn't try on RHEL8/Fedora yet.

Version-Release number of selected component (if applicable):
NetworkManager-1.18.0-6.el7.x86_64

Comment 2 Marcelo Ricardo Leitner 2019-12-19 19:20:13 UTC
Created attachment 1646656 [details]
Flamegraph during the insert test

Here we can use up to 6 CPUs to process it. Yes, NM is not a big part of it, yet it's arguable useless work being done. Remember that each message ignored by NM userspace is at least a skb cloned in kernel.

Comment 3 Marcelo Ricardo Leitner 2019-12-19 19:21:09 UTC
Created attachment 1646657 [details]
Same flamegraph, but wider/more readable

Comment 4 Marcelo Ricardo Leitner 2019-12-19 19:24:02 UTC
Attachments from comment #2 and #3 are for tests with OvS, as described in https://bugzilla.redhat.com/show_bug.cgi?id=1785040#c0

Sorry, I had forgot this bz and commented on this one thinking it was the new one.
Will fix it now closing the new one as dupe of this.

Comment 5 Marcelo Ricardo Leitner 2019-12-19 19:24:55 UTC
*** Bug 1785040 has been marked as a duplicate of this bug. ***

Comment 6 Marcelo Ricardo Leitner 2019-12-19 19:30:13 UTC
One idea to fix this is, from OvS side, is to create a new flag for when adding new filters, one that silence broadcasting such event.
AFAIK OvS doesn't rely on the event (even though it probably should).
Although IMHO the probability of this being rejected upstream is quite high, because it's like saying "hey please skip this packet from tcpdumps, just sneak it through!".

Comment 7 Marcelo Ricardo Leitner 2019-12-20 11:51:18 UTC
(In reply to Marcelo Ricardo Leitner from comment #6)
> One idea to fix this is, from OvS side, is to create a new flag for when
> adding new filters, one that silence broadcasting such event.
> AFAIK OvS doesn't rely on the event (even though it probably should).

Actually it does rely on it, and the broadcast would need to be turned into a unicast only.
ovs' parse_netlink_to_tc_flower() parses it entirely.

Comment 8 Thomas Haller 2019-12-20 16:17:45 UTC
> One idea to fix this is, from OvS side, is to create a new flag for when adding new filters, one that silence broadcasting such event.

if these filters are otherwise obtainable via RTM_GETTFILTER (`tc filter show`), than that seems very wrong.

The purpose of the netlink notifications it to notify about the content of the configured "objects", so not to have to fetch them all. Not emitting events defeats the purpose of why they exist.

Comment 9 Marcelo Ricardo Leitner 2019-12-20 17:27:42 UTC
(In reply to Thomas Haller from comment #8)
> > One idea to fix this is, from OvS side, is to create a new flag for when adding new filters, one that silence broadcasting such event.
> 
> if these filters are otherwise obtainable via RTM_GETTFILTER (`tc filter
> show`), than that seems very wrong.

They are.

> 
> The purpose of the netlink notifications it to notify about the content of
> the configured "objects", so not to have to fetch them all. Not emitting
> events defeats the purpose of why they exist.

Agreed. It at very least would hinder debug-ability.

Comment 10 Davide Caratti 2020-06-09 14:48:24 UTC
(In reply to Marcelo Ricardo Leitner from comment #3)
> Created attachment 1646657 [details]
> Same flamegraph, but wider/more readable

it's strange that tcf_pedit_dump() takes many more cycles compared
to other TC actions. Probably this is something that can improve
reducing the lock contention with the traffic plane (e.g. with the
RCU-ification)? that would improve the flame graphs - and reduce
CPU usage by some percent points. This would also allow skipping
per-cpu counters allocation (and dump) [1], thus obtaining another
(small, but maybe visible) speedup.

do you think it's worth a try?
-- 
davide

[1] https://lore.kernel.org/netdev/20191022141804.27639-1-vladbu@mellanox.com/

Comment 11 Marcelo Ricardo Leitner 2020-06-09 15:18:29 UTC
(In reply to Davide Caratti from comment #10)
> do you think it's worth a try?

Yes! For both PoV, memory and CPU savings (the latter which is two-fold, on per-cpu processing and avoiding spinlock contention). :)

Comment 17 Gris Ge 2021-03-02 06:11:03 UTC
Hi Marcelo,

Is there any action plan for this bug?

Comment 18 Marcelo Ricardo Leitner 2021-03-02 15:07:28 UTC
Hi Gris. I don't know. I'm not that involved with NM development. Maybe Antonio knows better.

Comment 21 Gris Ge 2021-03-24 04:20:11 UTC
Hi Thomas,

Will this bug been solved as side effect of https://bugzilla.redhat.com/show_bug.cgi?id=1847125 ?

Comment 22 RHEL Program Management 2021-03-31 07:30:51 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 24 Marcelo Ricardo Leitner 2021-07-09 19:32:55 UTC
Can we make this bug public?

Comment 25 Till Maas 2021-07-09 19:55:45 UTC
(In reply to Marcelo Ricardo Leitner from comment #24)
> Can we make this bug public?

It seems to me that you made it private. I don't see a reason why it should be private.

Comment 26 Marcelo Ricardo Leitner 2021-07-09 20:46:41 UTC
Oh! Lets flip it, then. Thanks.

Comment 28 Beniamino Galvani 2021-09-15 08:10:59 UTC
Created attachment 1823201 [details]
Reproducer

Reproducer script to check NM CPU usage with many tc filters.

Result with current git main branch of NM:

    # ./rh1753677.sh
    * Setting up interface...
    * Inserting filters...
    * Waiting for NM to settle...

    NM execution time:
    User:    11760 ms
    Kernel:   3800 ms
    Total:   15560 ms

With branch `bg/tc-no-cache`:

    # ./rh1753677.sh
    * Setting up interface...
    * Inserting filters...
    * Waiting for NM to settle...

    NM execution time:
    User:        0 ms
    Kernel:      0 ms
    Total:       0 ms

Comment 34 errata-xmlrpc 2022-05-10 14:54:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (NetworkManager bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:1985