Hide Forgot
Hi Thomas, Please provide a way to mature this performance improvement. Thank you!
Gris, please update the summary the exact scenario that is going to be improved and also in a comment since it has now devel ack. The original proposal was DHCPv4 with 3000 devices but AFAIU, it is now something about 1000 deviced and no DHCP. Thank you.
We have a test for 500 vlans with DHCPv4 and it takes some time to set it all up https://gitlab.freedesktop.org/NetworkManager/NetworkManager-ci/-/merge_requests/726 We can easily move it to 1000+
This RHV use case could be the base line of performance: When creating 1000 VLANs from eth1(pre-created veth) and 1000 bridge over each vlans, nmstate takes 10m38.439s. NetworkManager-1.30.0-2.el8.x86_64 nmstate-1.0.2-5.el8.noarch trace log disabled.
Gris, please also add the goal that needs to be achieved to consider this feature being implemented and update the summary accordingly. Thanks.
20% improvement is good enough for me.
I did some optimizations: https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/890 https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/894 https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/900 these well be soon merged to nm-1-32 and will then reach rhel-8.5 These optimizations mainly make frequently called code run faster. What they don't do, is changing those code to call them less frequently. As such, it optimizes some lower layers (which was simpler, but is limited in effectiveness). I mean, when we call a function millions of times, then making that functions fast helps. But what helps more is to not call it that often (but that is also often harder). One rework of the higher layers is in progress with Layer3Config rework. So I did not want to address that. Some testing: The test scripts are commited to git: https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/tree/26090bafc9fd2eceeccffb758937427ae5dd160b I created a setup with sudo DO_ADD_VLAN_CON=1 NUM_DEVS=1 NUM_VLAN_DEVS=1000 contrib/scripts/test-create-many-device-setup.sh setup (NetworkManager.dispatcher disabled and dns=none) then I ran #1 time examples/python/gi/nm-up-many.py c-a1.{1..1000}-po #2 time examples/python/gi/nm-up-many.py c-a1.{1..1000}-po #3 time examples/python/gi/nm-up-many.py c-a1.{1..100}-po #4 time examples/python/gi/nm-up-many.py c-a1.{1..200}-po this only activates the ports, and does not wait for the bridges to get their IP addresses. Timings: Test# 1.32.0 <new> diff % #1 484 472 -2.47933884297521 #2 998 827 -17.1342685370741 #3 72 60 -16.6666666666667 #3 42 25 -40.4761904761905 #3 40 25 -37.5 #3 36 35 -2.77777777777778 #4 137 125 -8.75912408759124 #4 218 87 -60.0917431192661 #4 115 78 -32.1739130434783 we see large fluctuations, but I think it about 20% better :) In the future, we need to address the higher layers, to significantly improve the performance beyond a lower percentage number. This was still useful to look at valgrind runs, and to write some test scripts.
I added the test adding 2000 connection together via libnm, it seems to be running 540s on 1.30 and 400s on 1.32 (on average), which is at least 20% improvement, so verifying.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: NetworkManager security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:4361