RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1847125 - [RFE] Improve 20% performance on creating 1000 bridge over 1000 VLANs
Summary: [RFE] Improve 20% performance on creating 1000 bridge over 1000 VLANs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: NetworkManager
Version: 8.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: 8.5
Assignee: Thomas Haller
QA Contact: Filip Pokryvka
URL:
Whiteboard:
Depends On: 1711215
Blocks: 1935910
TreeView+ depends on / blocked
 
Reported: 2020-06-15 18:07 UTC by Thomas Haller
Modified: 2021-11-10 06:42 UTC (History)
15 users (show)

Fixed In Version: NetworkManager-1.32.2-1.el8
Doc Type: No Doc Update
Doc Text:
Clone Of: 1711215
Environment:
Last Closed: 2021-11-09 19:28:55 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2021:4361 0 None None None 2021-11-09 19:29:34 UTC
freedesktop.org Gitlab NetworkManager NetworkManager-ci merge_requests 802 0 None opened vlan: add 'vlan_create_1000_bridges_over_1000_vlans' test 2021-07-29 17:41:32 UTC

Comment 3 Gris Ge 2021-03-05 04:48:30 UTC
Hi Thomas,

Please provide a way to mature this performance improvement.

Thank you!

Comment 4 Till Maas 2021-03-05 08:28:12 UTC
Gris, please update the summary the exact scenario that is going to be improved and also in a comment since it has now devel ack. The original proposal was DHCPv4 with 3000 devices but AFAIU, it is now something about 1000 deviced and no DHCP. Thank you.

Comment 5 Vladimir Benes 2021-03-08 12:24:42 UTC
We have a test for 500 vlans with DHCPv4 and it takes some time to set it all up 
https://gitlab.freedesktop.org/NetworkManager/NetworkManager-ci/-/merge_requests/726

We can easily move it to 1000+

Comment 6 Gris Ge 2021-03-08 13:59:33 UTC
This RHV use case could be the base line of performance:

When creating 1000 VLANs from eth1(pre-created veth) and 1000 bridge over each vlans, nmstate takes 10m38.439s.

NetworkManager-1.30.0-2.el8.x86_64
nmstate-1.0.2-5.el8.noarch
trace log disabled.

Comment 7 Till Maas 2021-03-08 19:18:41 UTC
Gris, please also add the goal that needs to be achieved to consider this feature being implemented and update the summary accordingly. Thanks.

Comment 8 Gris Ge 2021-03-24 04:39:22 UTC
20% improvement is good enough for me.

Comment 10 Thomas Haller 2021-06-22 07:52:31 UTC
I did some optimizations:

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/890
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/894
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/900

these well be soon merged to nm-1-32 and will then reach rhel-8.5


These optimizations mainly make frequently called code run faster.
What they don't do, is changing those code to call them less frequently.
As such, it optimizes some lower layers (which was simpler, but is limited
in effectiveness). I mean, when we call a function millions of times, then making
that functions fast helps. But what helps more is to not call it that often
(but that is also often harder).

One rework of the higher layers is in progress with Layer3Config rework. So
I did not want to address that.




Some testing:

The test scripts are commited to git:

  https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/tree/26090bafc9fd2eceeccffb758937427ae5dd160b

I created a setup with

  sudo DO_ADD_VLAN_CON=1 NUM_DEVS=1 NUM_VLAN_DEVS=1000 contrib/scripts/test-create-many-device-setup.sh setup

(NetworkManager.dispatcher disabled and dns=none)

then I ran

 #1  time examples/python/gi/nm-up-many.py c-a1.{1..1000}-po
 #2  time examples/python/gi/nm-up-many.py c-a1.{1..1000}-po
 #3  time examples/python/gi/nm-up-many.py c-a1.{1..100}-po
 #4  time examples/python/gi/nm-up-many.py c-a1.{1..200}-po


this only activates the ports, and does not wait for the bridges to get their IP addresses.


Timings:

Test#	1.32.0	<new>	diff %
#1	484	472	-2.47933884297521
			
#2	998	827	-17.1342685370741
			
#3	72	60	-16.6666666666667
#3	42	25	-40.4761904761905
#3	40	25	-37.5
#3	36	35	-2.77777777777778
			
#4	137	125	-8.75912408759124
#4	218	87	-60.0917431192661
#4	115	78	-32.1739130434783


we see large fluctuations, but I think it about 20% better :)



In the future, we need to address the higher layers, to significantly improve the performance beyond a lower percentage number.
This was still useful to look at valgrind runs, and to write some test scripts.

Comment 14 Filip Pokryvka 2021-07-29 17:41:33 UTC
I added the test adding 2000 connection together via libnm, it seems to be running 540s on 1.30 and 400s on 1.32 (on average), which is at least 20% improvement, so verifying.

Comment 16 errata-xmlrpc 2021-11-09 19:28:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: NetworkManager security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4361


Note You need to log in before you can comment on or make changes to this bug.