1753677 – High cpu usage while non-controlled interface is mangling tc filters

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1753677 - High cpu usage while non-controlled interface is mangling tc filters

Summary: High cpu usage while non-controlled interface is mangling tc filters

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	NetworkManager
Sub Component:
Version:	8.2
Hardware:	Unspecified
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	8.0
Assignee:	Beniamino Galvani
QA Contact:	Matej Berezny
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1785040 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-09-19 14:59 UTC by Marcelo Ricardo Leitner
Modified:	2022-05-10 15:30 UTC (History)
CC List:	14 users (show)
Fixed In Version:	NetworkManager-1.34.0-0.2.el8
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-05-10 14:54:06 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Flamegraph during the insert test (188.65 KB, image/svg+xml) 2019-12-19 19:20 UTC, Marcelo Ricardo Leitner	no flags	Details
Same flamegraph, but wider/more readable (954.18 KB, image/svg+xml) 2019-12-19 19:21 UTC, Marcelo Ricardo Leitner	no flags	Details
Reproducer (1.88 KB, application/x-shellscript) 2021-09-15 08:10 UTC, Beniamino Galvani	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1753684	high	CLOSED	Excessive memory usage while dealing with a flood of kernel messages	2022-06-29 13:22:38 UTC
Red Hat Product Errata	RHEA-2022:1985	None	None	None	2022-05-10 14:54:38 UTC
freedesktop.org Gitlab	NetworkManager NetworkManager-ci merge_requests 850	None	merged	tc: tc_device_filter_management test	2021-10-08 11:25:19 UTC
freedesktop.org Gitlab	NetworkManager NetworkManager merge_requests 981	None	merged	platform: don't listen for tc netlink messages	2021-09-20 11:38:52 UTC

Internal Links: 1753684

Description Marcelo Ricardo Leitner 2019-09-19 14:59:14 UTC

Description of problem:
Test consist of adding 1M tc filter rules, such as:
# tc qdisc add dev p6p1 ingress
# tc -b add.0
on which:
[root@wsfd-netdev19 ~]# head add.0
filter add dev p6p1 protocol ip ingress prio 1 handle 1 flower skip_hw src_mac e4:11:0:0:0:0 dst_mac e4:12:0:0:0:0  action gact drop
filter add dev p6p1 protocol ip ingress prio 1 handle 2 flower skip_hw src_mac e4:11:0:0:0:1 dst_mac e4:12:0:0:0:1  action gact drop

even with:
[root@wsfd-netdev19 ~]# cat /etc/sysconfig/network-scripts/ifcfg-p6p1
NM_CONTROLLED=no

I see:
   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 10684 root      20   0    9888    700    584 R  99.3  0.0   0:22.06 tc
  1077 root      20   0  755728 292112   6692 S  53.6  0.3   1:41.64 NetworkManager
                                                 ^^^^ excess usage

It keeps using CPU till the tc -b command ends.
It was consuming ~90% actually up to when this kernel patch was applied: https://bugzilla.redhat.com/show_bug.cgi?id=1744829
Point being, the % itself may be flaky but NM is actively working.

I didn't try on RHEL8/Fedora yet.

Version-Release number of selected component (if applicable):
NetworkManager-1.18.0-6.el7.x86_64

Comment 2 Marcelo Ricardo Leitner 2019-12-19 19:20:13 UTC

Created attachment 1646656 [details]
Flamegraph during the insert test

Here we can use up to 6 CPUs to process it. Yes, NM is not a big part of it, yet it's arguable useless work being done. Remember that each message ignored by NM userspace is at least a skb cloned in kernel.

Comment 3 Marcelo Ricardo Leitner 2019-12-19 19:21:09 UTC

Created attachment 1646657 [details]
Same flamegraph, but wider/more readable

Comment 4 Marcelo Ricardo Leitner 2019-12-19 19:24:02 UTC

Attachments from comment #2 and #3 are for tests with OvS, as described in https://bugzilla.redhat.com/show_bug.cgi?id=1785040#c0

Sorry, I had forgot this bz and commented on this one thinking it was the new one.
Will fix it now closing the new one as dupe of this.

Comment 5 Marcelo Ricardo Leitner 2019-12-19 19:24:55 UTC

*** Bug 1785040 has been marked as a duplicate of this bug. ***

Comment 6 Marcelo Ricardo Leitner 2019-12-19 19:30:13 UTC

One idea to fix this is, from OvS side, is to create a new flag for when adding new filters, one that silence broadcasting such event.
AFAIK OvS doesn't rely on the event (even though it probably should).
Although IMHO the probability of this being rejected upstream is quite high, because it's like saying "hey please skip this packet from tcpdumps, just sneak it through!".

Comment 7 Marcelo Ricardo Leitner 2019-12-20 11:51:18 UTC

(In reply to Marcelo Ricardo Leitner from comment #6)
> One idea to fix this is, from OvS side, is to create a new flag for when
> adding new filters, one that silence broadcasting such event.
> AFAIK OvS doesn't rely on the event (even though it probably should).

Actually it does rely on it, and the broadcast would need to be turned into a unicast only.
ovs' parse_netlink_to_tc_flower() parses it entirely.

Comment 8 Thomas Haller 2019-12-20 16:17:45 UTC

> One idea to fix this is, from OvS side, is to create a new flag for when adding new filters, one that silence broadcasting such event.

if these filters are otherwise obtainable via RTM_GETTFILTER (`tc filter show`), than that seems very wrong.

The purpose of the netlink notifications it to notify about the content of the configured "objects", so not to have to fetch them all. Not emitting events defeats the purpose of why they exist.

Comment 9 Marcelo Ricardo Leitner 2019-12-20 17:27:42 UTC

(In reply to Thomas Haller from comment #8)
> > One idea to fix this is, from OvS side, is to create a new flag for when adding new filters, one that silence broadcasting such event.
> 
> if these filters are otherwise obtainable via RTM_GETTFILTER (`tc filter
> show`), than that seems very wrong.

They are.

> 
> The purpose of the netlink notifications it to notify about the content of
> the configured "objects", so not to have to fetch them all. Not emitting
> events defeats the purpose of why they exist.

Agreed. It at very least would hinder debug-ability.

Comment 10 Davide Caratti 2020-06-09 14:48:24 UTC

(In reply to Marcelo Ricardo Leitner from comment #3)
> Created attachment 1646657 [details]
> Same flamegraph, but wider/more readable

it's strange that tcf_pedit_dump() takes many more cycles compared
to other TC actions. Probably this is something that can improve
reducing the lock contention with the traffic plane (e.g. with the
RCU-ification)? that would improve the flame graphs - and reduce
CPU usage by some percent points. This would also allow skipping
per-cpu counters allocation (and dump) [1], thus obtaining another
(small, but maybe visible) speedup.

do you think it's worth a try?
-- 
davide

[1] https://lore.kernel.org/netdev/20191022141804.27639-1-vladbu@mellanox.com/

Comment 11 Marcelo Ricardo Leitner 2020-06-09 15:18:29 UTC

(In reply to Davide Caratti from comment #10)
> do you think it's worth a try?

Yes! For both PoV, memory and CPU savings (the latter which is two-fold, on per-cpu processing and avoiding spinlock contention). :)

Comment 17 Gris Ge 2021-03-02 06:11:03 UTC

Hi Marcelo,

Is there any action plan for this bug?

Comment 18 Marcelo Ricardo Leitner 2021-03-02 15:07:28 UTC

Hi Gris. I don't know. I'm not that involved with NM development. Maybe Antonio knows better.

Comment 21 Gris Ge 2021-03-24 04:20:11 UTC

Hi Thomas,

Will this bug been solved as side effect of https://bugzilla.redhat.com/show_bug.cgi?id=1847125 ?

Comment 22 RHEL Program Management 2021-03-31 07:30:51 UTC

After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 24 Marcelo Ricardo Leitner 2021-07-09 19:32:55 UTC

Can we make this bug public?

Comment 25 Till Maas 2021-07-09 19:55:45 UTC

(In reply to Marcelo Ricardo Leitner from comment #24)
> Can we make this bug public?

It seems to me that you made it private. I don't see a reason why it should be private.

Comment 26 Marcelo Ricardo Leitner 2021-07-09 20:46:41 UTC

Oh! Lets flip it, then. Thanks.

Comment 28 Beniamino Galvani 2021-09-15 08:10:59 UTC

Created attachment 1823201 [details]
Reproducer

Reproducer script to check NM CPU usage with many tc filters.

Result with current git main branch of NM:

    # ./rh1753677.sh
    * Setting up interface...
    * Inserting filters...
    * Waiting for NM to settle...

    NM execution time:
    User:    11760 ms
    Kernel:   3800 ms
    Total:   15560 ms

With branch `bg/tc-no-cache`:

    # ./rh1753677.sh
    * Setting up interface...
    * Inserting filters...
    * Waiting for NM to settle...

    NM execution time:
    User:        0 ms
    Kernel:      0 ms
    Total:       0 ms

Comment 34 errata-xmlrpc 2022-05-10 14:54:06 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (NetworkManager bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:1985

Note You need to log in before you can comment on or make changes to this bug.