RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1872618 - nmstate sometimes fail to create desired connection when attaching many networks at once
Summary: nmstate sometimes fail to create desired connection when attaching many netwo...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: nmstate
Version: 8.2
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: 8.4
Assignee: Gris Ge
QA Contact: Mingyu Shi
URL:
Whiteboard:
Depends On:
Blocks: 1916073
TreeView+ depends on / blocked
 
Reported: 2020-08-26 08:29 UTC by Michael Burman
Modified: 2021-05-18 15:18 UTC (History)
8 users (show)

Fixed In Version: nmstate-1.0.1-1.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1916073 (view as bug list)
Environment:
Last Closed: 2021-05-18 15:17:12 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github nmstate nmstate pull 1473 0 None closed nm: Use fallback checker on profile deactivation and delete 2021-02-01 09:50:00 UTC

Description Michael Burman 2020-08-26 08:29:05 UTC
Description of problem:
nmstate sometimes fail to create desired connection when attaching many networks at once

RHV network team started to test nmstate scaling scenarios. 
Attaching 100 VLANs to 1 NIC
Attaching 200 VLANs to 1 NIC
Attaching 100 VLANs to 2 NICs
Attaching 200 VLANs to 2 NICs

This tests usually pass, but already failed two times with an error:

MainProcess|jsonrpc/5::ERROR::2020-08-25 17:53:06,010::supervdsm_server::97::SuperVdsm.ServerCallback::(wrapper) Error in setupNetworks
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/supervdsm_server.py", line 95, in wrapper
    res = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/network/api.py", line 241, in setupNetworks
    _setup_networks(networks, bondings, options, net_info)
  File "/usr/lib/python3.6/site-packages/vdsm/network/api.py", line 266, in _setup_networks
    networks, bondings, options, net_info, in_rollback
  File "/usr/lib/python3.6/site-packages/vdsm/network/netswitch/configurator.py", line 154, in setup
    _setup_nmstate(networks, bondings, options, in_rollback)
  File "/usr/lib/python3.6/site-packages/vdsm/network/netswitch/configurator.py", line 199, in _setup_nmstate
    nmstate.setup(desired_state, verify_change=not in_rollback)
  File "/usr/lib/python3.6/site-packages/vdsm/network/nmstate/api.py", line 48, in setup
    state_apply(desired_state, verify_change=verify_change)
  File "/usr/lib/python3.6/site-packages/libnmstate/deprecation.py", line 40, in wrapper
    return func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/libnmstate/nm/nmclient.py", line 96, in wrapped
    ret = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 73, in apply
    state.State(desired_state), verify_change, commit, rollback_timeout
  File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 163, in _apply_ifaces_state
    con_profiles=ifaces_add_configs + ifaces_edit_configs,
  File "/usr/lib64/python3.6/contextlib.py", line 88, in __exit__
    next(self.gen)
  File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 232, in _setup_providers
    mainloop.run(timeout=MAINLOOP_TIMEOUT)
  File "/usr/lib/python3.6/site-packages/libnmstate/nm/nmclient.py", line 177, in run
    f"Unexpected failure of libnm when running the mainloop: {err}"


Version-Release number of selected component (if applicable):
nmstate-0.2.10-1.el8.noarch
NetworkManager-1.22.8-5.el8_2.x86_64

How reproducible:
From time to time when running RHV nmstate scale tests. Looks like a race.

Steps to Reproduce:
1. Attach multiple vlan networks(100-400) at one host to a NIC

Actual results:
Fail from to time on nmstate side

Expected results:
Should pass

Additional info:
I will try to reproduce with NM in trace level.

Comment 2 Gris Ge 2020-09-01 17:14:33 UTC
Hi Michael,

I noticed this bug is reported against NM 1.22 and nmstate 0.2.10.
I don't think I can do anything on RHEL 8.2.

Can you try it on RHEL 8.3?
    NetworkManager-1.26.0-6.el8.x86_64
    nmstate-0.3.4-12.el8.noarch

I tried in my VM 10+ times, no failure found.

Thank you very much!

Comment 3 Michael Burman 2020-09-02 10:58:08 UTC
(In reply to Gris Ge from comment #2)
> Hi Michael,
> 
> I noticed this bug is reported against NM 1.22 and nmstate 0.2.10.
> I don't think I can do anything on RHEL 8.2.
I understand
> 
> Can you try it on RHEL 8.3?
>     NetworkManager-1.26.0-6.el8.x86_64
>     nmstate-0.3.4-12.el8.noarch
> 
> I tried in my VM 10+ times, no failure found.
> 
> Thank you very much!

Hi Gris,
Well so far i didn't managed to reproduce on rhel8.3 with nmstate0.3, only with 0.2
As RHV still using rhel8.2 and nmstate-0.2 it's reproduced with our new scale tests. 
But with 0.3 it haven't failed yet. 
RHV not moved to 8.3 yet, so these tests not running a lot now.

Comment 4 Gris Ge 2020-09-02 14:18:46 UTC
(In reply to Michael Burman from comment #3)
> (In reply to Gris Ge from comment #2)
> > Hi Michael,
> > 
> > I noticed this bug is reported against NM 1.22 and nmstate 0.2.10.
> > I don't think I can do anything on RHEL 8.2.
> I understand
> > 
> > Can you try it on RHEL 8.3?
> >     NetworkManager-1.26.0-6.el8.x86_64
> >     nmstate-0.3.4-12.el8.noarch
> > 
> > I tried in my VM 10+ times, no failure found.
> > 
> > Thank you very much!
> 
> Hi Gris,
> Well so far i didn't managed to reproduce on rhel8.3 with nmstate0.3, only
> with 0.2
> As RHV still using rhel8.2 and nmstate-0.2 it's reproduced with our new
> scale tests. 
> But with 0.3 it haven't failed yet. 
> RHV not moved to 8.3 yet, so these tests not running a lot now.


Thanks for the info.
Let's keep this bug as tracking on your scale test on RHEL 8.3/8.4.

Comment 5 Michael Burman 2020-09-08 07:17:03 UTC
Hi Gris

As i updated in the email, the issue now seen and reproduced on nmstate-0.3.4-12.el8.noarch

Comment 11 Gris Ge 2021-01-13 14:08:32 UTC
To reproduce this problem locally, just try to create 1000 VLANs from single veth/eth and 1000 bridge using each vlan.

Upstream has been merged the fix at: https://github.com/nmstate/nmstate/pull/1468

Comment 15 Mingyu Shi 2021-01-14 10:20:27 UTC
version:
nmstate-1.0.1-0.20210113113732141851.pr1468.21.g8a33e52.el8.noarch
nispor-1.0.1-2.el8.x86_64
NetworkManager-1.30.0-0.5.el8.x86_64 

Hi Gris, I created successfully(1000 bridges with 1 vlan attached) but timeout when took all absent:
***
2021-01-14 17:27:14,184 root         DEBUG    Interface vebase.385 rollback succeeded
2021-01-14 17:27:14,185 root         DEBUG    Interface vebase.379 rollback succeeded
2021-01-14 17:27:14,185 root         DEBUG    Async action: Rollback to checkpoint /org/freedesktop/NetworkManager/Checkpoint/2 finished
2021-01-14 17:27:14,239 root         ERROR    BUG: NM.Client is not cleaned
Traceback (most recent call last):
  File "/usr/bin/nmstatectl", line 11, in <module>
    load_entry_point('nmstate==1.0.1', 'console_scripts', 'nmstatectl')()
  File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 70, in main
    return args.func(args)
  File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 280, in apply
    args.save_to_disk,
  File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 320, in apply_state
    save_to_disk=save_to_disk,
  File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 71, in apply
    _apply_ifaces_state(plugins, net_state, verify_change, save_to_disk)
  File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 104, in _apply_ifaces_state
    plugin.apply_changes(net_state, save_to_disk)
  File "/usr/lib/python3.6/site-packages/libnmstate/nm/plugin.py", line 190, in apply_changes
    NmProfiles(self.context).apply_config(net_state, save_to_disk)
  File "/usr/lib/python3.6/site-packages/libnmstate/nm/profiles.py", line 60, in apply_config
    profile.do_action(action)
  File "/usr/lib/python3.6/site-packages/libnmstate/nm/profile.py", line 360, in do_action
    self._deactivate()
  File "/usr/lib/python3.6/site-packages/libnmstate/nm/profile.py", line 305, in _deactivate
    self._ctx, self._iface.name, self._iface.type, self._nm_ac
  File "/usr/lib/python3.6/site-packages/libnmstate/nm/active_connection.py", line 359, in run
    self._ctx.register_async(action)
  File "/usr/lib/python3.6/site-packages/libnmstate/nm/context.py", line 144, in register_async
    self.wait_all_finish()
  File "/usr/lib/python3.6/site-packages/libnmstate/nm/context.py", line 221, in wait_all_finish
    raise tmp_error
libnmstate.error.NmstateLibnmError: Deactivate profile: br280 linux-bridge failed: error=g-io-error-quark: Timeout was reached (24)
real    2m30.105s
user    0m50.738s
sys     0m2.521s


Tried for 2nd time, successful with 
real    6m10.169s
user    1m11.627s
sys     0m5.155s

And for creation, it took about 17m26s

Comment 16 Gris Ge 2021-01-14 13:34:34 UTC
I will continue my work on the profile deactivation and deletion.

Comment 17 Gris Ge 2021-01-15 02:13:43 UTC
Hi Mingyu,

Please try `dnf copr enable packit/nmstate-nmstate-1473`.

Comment 20 Mingyu Shi 2021-02-01 09:49:47 UTC
Verified with versions:
nmstate-1.0.2-0.1.el8.noarch
nispor-1.0.1-2.el8.x86_64
NetworkManager-1.30.0-0.8.el8.x86_64

Comment 22 errata-xmlrpc 2021-05-18 15:17:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (nmstate bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1748


Note You need to log in before you can comment on or make changes to this bug.