Bug 1872618
| Summary: | nmstate sometimes fail to create desired connection when attaching many networks at once | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Michael Burman <mburman> | |
| Component: | nmstate | Assignee: | Gris Ge <fge> | |
| Status: | CLOSED ERRATA | QA Contact: | Mingyu Shi <mshi> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | medium | |||
| Version: | 8.2 | CC: | dholler, ferferna, fge, jiji, jishi, jwboyer, network-qe, till | |
| Target Milestone: | rc | Keywords: | Triaged, ZStream | |
| Target Release: | 8.4 | Flags: | pm-rhel:
mirror+
|
|
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | nmstate-1.0.1-1.el8 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1916073 (view as bug list) | Environment: | ||
| Last Closed: | 2021-05-18 15:17:12 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1916073 | |||
Hi Michael,
I noticed this bug is reported against NM 1.22 and nmstate 0.2.10.
I don't think I can do anything on RHEL 8.2.
Can you try it on RHEL 8.3?
NetworkManager-1.26.0-6.el8.x86_64
nmstate-0.3.4-12.el8.noarch
I tried in my VM 10+ times, no failure found.
Thank you very much!
(In reply to Gris Ge from comment #2) > Hi Michael, > > I noticed this bug is reported against NM 1.22 and nmstate 0.2.10. > I don't think I can do anything on RHEL 8.2. I understand > > Can you try it on RHEL 8.3? > NetworkManager-1.26.0-6.el8.x86_64 > nmstate-0.3.4-12.el8.noarch > > I tried in my VM 10+ times, no failure found. > > Thank you very much! Hi Gris, Well so far i didn't managed to reproduce on rhel8.3 with nmstate0.3, only with 0.2 As RHV still using rhel8.2 and nmstate-0.2 it's reproduced with our new scale tests. But with 0.3 it haven't failed yet. RHV not moved to 8.3 yet, so these tests not running a lot now. (In reply to Michael Burman from comment #3) > (In reply to Gris Ge from comment #2) > > Hi Michael, > > > > I noticed this bug is reported against NM 1.22 and nmstate 0.2.10. > > I don't think I can do anything on RHEL 8.2. > I understand > > > > Can you try it on RHEL 8.3? > > NetworkManager-1.26.0-6.el8.x86_64 > > nmstate-0.3.4-12.el8.noarch > > > > I tried in my VM 10+ times, no failure found. > > > > Thank you very much! > > Hi Gris, > Well so far i didn't managed to reproduce on rhel8.3 with nmstate0.3, only > with 0.2 > As RHV still using rhel8.2 and nmstate-0.2 it's reproduced with our new > scale tests. > But with 0.3 it haven't failed yet. > RHV not moved to 8.3 yet, so these tests not running a lot now. Thanks for the info. Let's keep this bug as tracking on your scale test on RHEL 8.3/8.4. Hi Gris As i updated in the email, the issue now seen and reproduced on nmstate-0.3.4-12.el8.noarch To reproduce this problem locally, just try to create 1000 VLANs from single veth/eth and 1000 bridge using each vlan. Upstream has been merged the fix at: https://github.com/nmstate/nmstate/pull/1468 version:
nmstate-1.0.1-0.20210113113732141851.pr1468.21.g8a33e52.el8.noarch
nispor-1.0.1-2.el8.x86_64
NetworkManager-1.30.0-0.5.el8.x86_64
Hi Gris, I created successfully(1000 bridges with 1 vlan attached) but timeout when took all absent:
***
2021-01-14 17:27:14,184 root DEBUG Interface vebase.385 rollback succeeded
2021-01-14 17:27:14,185 root DEBUG Interface vebase.379 rollback succeeded
2021-01-14 17:27:14,185 root DEBUG Async action: Rollback to checkpoint /org/freedesktop/NetworkManager/Checkpoint/2 finished
2021-01-14 17:27:14,239 root ERROR BUG: NM.Client is not cleaned
Traceback (most recent call last):
File "/usr/bin/nmstatectl", line 11, in <module>
load_entry_point('nmstate==1.0.1', 'console_scripts', 'nmstatectl')()
File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 70, in main
return args.func(args)
File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 280, in apply
args.save_to_disk,
File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 320, in apply_state
save_to_disk=save_to_disk,
File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 71, in apply
_apply_ifaces_state(plugins, net_state, verify_change, save_to_disk)
File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 104, in _apply_ifaces_state
plugin.apply_changes(net_state, save_to_disk)
File "/usr/lib/python3.6/site-packages/libnmstate/nm/plugin.py", line 190, in apply_changes
NmProfiles(self.context).apply_config(net_state, save_to_disk)
File "/usr/lib/python3.6/site-packages/libnmstate/nm/profiles.py", line 60, in apply_config
profile.do_action(action)
File "/usr/lib/python3.6/site-packages/libnmstate/nm/profile.py", line 360, in do_action
self._deactivate()
File "/usr/lib/python3.6/site-packages/libnmstate/nm/profile.py", line 305, in _deactivate
self._ctx, self._iface.name, self._iface.type, self._nm_ac
File "/usr/lib/python3.6/site-packages/libnmstate/nm/active_connection.py", line 359, in run
self._ctx.register_async(action)
File "/usr/lib/python3.6/site-packages/libnmstate/nm/context.py", line 144, in register_async
self.wait_all_finish()
File "/usr/lib/python3.6/site-packages/libnmstate/nm/context.py", line 221, in wait_all_finish
raise tmp_error
libnmstate.error.NmstateLibnmError: Deactivate profile: br280 linux-bridge failed: error=g-io-error-quark: Timeout was reached (24)
real 2m30.105s
user 0m50.738s
sys 0m2.521s
Tried for 2nd time, successful with
real 6m10.169s
user 1m11.627s
sys 0m5.155s
And for creation, it took about 17m26s
I will continue my work on the profile deactivation and deletion. Hi Mingyu, Please try `dnf copr enable packit/nmstate-nmstate-1473`. Verified with versions: nmstate-1.0.2-0.1.el8.noarch nispor-1.0.1-2.el8.x86_64 NetworkManager-1.30.0-0.8.el8.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (nmstate bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:1748 |
Description of problem: nmstate sometimes fail to create desired connection when attaching many networks at once RHV network team started to test nmstate scaling scenarios. Attaching 100 VLANs to 1 NIC Attaching 200 VLANs to 1 NIC Attaching 100 VLANs to 2 NICs Attaching 200 VLANs to 2 NICs This tests usually pass, but already failed two times with an error: MainProcess|jsonrpc/5::ERROR::2020-08-25 17:53:06,010::supervdsm_server::97::SuperVdsm.ServerCallback::(wrapper) Error in setupNetworks Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/supervdsm_server.py", line 95, in wrapper res = func(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/network/api.py", line 241, in setupNetworks _setup_networks(networks, bondings, options, net_info) File "/usr/lib/python3.6/site-packages/vdsm/network/api.py", line 266, in _setup_networks networks, bondings, options, net_info, in_rollback File "/usr/lib/python3.6/site-packages/vdsm/network/netswitch/configurator.py", line 154, in setup _setup_nmstate(networks, bondings, options, in_rollback) File "/usr/lib/python3.6/site-packages/vdsm/network/netswitch/configurator.py", line 199, in _setup_nmstate nmstate.setup(desired_state, verify_change=not in_rollback) File "/usr/lib/python3.6/site-packages/vdsm/network/nmstate/api.py", line 48, in setup state_apply(desired_state, verify_change=verify_change) File "/usr/lib/python3.6/site-packages/libnmstate/deprecation.py", line 40, in wrapper return func(*args, **kwargs) File "/usr/lib/python3.6/site-packages/libnmstate/nm/nmclient.py", line 96, in wrapped ret = func(*args, **kwargs) File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 73, in apply state.State(desired_state), verify_change, commit, rollback_timeout File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 163, in _apply_ifaces_state con_profiles=ifaces_add_configs + ifaces_edit_configs, File "/usr/lib64/python3.6/contextlib.py", line 88, in __exit__ next(self.gen) File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 232, in _setup_providers mainloop.run(timeout=MAINLOOP_TIMEOUT) File "/usr/lib/python3.6/site-packages/libnmstate/nm/nmclient.py", line 177, in run f"Unexpected failure of libnm when running the mainloop: {err}" Version-Release number of selected component (if applicable): nmstate-0.2.10-1.el8.noarch NetworkManager-1.22.8-5.el8_2.x86_64 How reproducible: From time to time when running RHV nmstate scale tests. Looks like a race. Steps to Reproduce: 1. Attach multiple vlan networks(100-400) at one host to a NIC Actual results: Fail from to time on nmstate side Expected results: Should pass Additional info: I will try to reproduce with NM in trace level.