2054933 – failed to apply/change the current state of ovs

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2054933 - failed to apply/change the current state of ovs

Summary: failed to apply/change the current state of ovs

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 9
Classification:	Red Hat
Component:	NetworkManager
Sub Component:
Version:	9.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Beniamino Galvani
QA Contact:	Vladimir Benes
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-02-16 03:43 UTC by Mingyu Shi
Modified:	2023-11-07 10:11 UTC (History)
CC List:	12 users (show)
Fixed In Version:	NetworkManager-1.43.6-1.el9
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-11-07 08:37:53 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:
Flags:	pm-rhel: mirror+

Attachments	(Terms of Use)
NM_trace.log (713.20 KB, text/plain) 2022-02-16 03:43 UTC, Mingyu Shi	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	NMT-86	None	None	None	2023-01-22 14:13:32 UTC
Red Hat Issue Tracker	RHELPLAN-112433	None	None	None	2022-02-16 03:45:22 UTC
Red Hat Product Errata	RHBA-2023:6585	None	None	None	2023-11-07 08:38:18 UTC
freedesktop.org Gitlab	NetworkManager NetworkManager merge_requests 1569	None	merged	device: wait port detach completion before changing port state	2023-04-04 06:25:22 UTC

Description Mingyu Shi 2022-02-16 03:43:47 UTC

Created attachment 1861388 [details]
NM_trace.log

Description of problem:
Failed to apply the current state of ovs.

Version-Release number of selected component (if applicable):
nmstate-1.2.1-1.el8.x86_64
nispor-1.2.3-1.el8.x86_64
NetworkManager-1.36.0-0.7.el8.x86_64
openvswitch2.11-2.11.3-93.el8fdp.x86_64

How reproducible:
100%

Steps to Reproduce:
ip link add veth0 type veth peer name veth0_ep
ip link set veth0 up
ip link set veth0_ep up
nmcli device set veth0 managed yes
nmcli device set veth0_ep managed yes
ip link add veth1 type veth peer name veth1_ep
ip link set veth1 up
ip link set veth1_ep up
nmcli device set veth1 managed yes
nmcli device set veth1_ep managed yes

cat << EOF > ovsbr0-nobond.yaml
interfaces:
- name: ovs-br0
  type: ovs-bridge
  state: up
  bridge:
    port:
    - name: ovs0
    - name: veth0
- name: ovs0
  type: ovs-interface
  state: up
  ipv4:
    enabled: true
    address:
    - ip: 1.1.1.1
      prefix-length: 24
  ipv6:
    enabled: true
    address:
    - ip: 1::1
      prefix-length: 64
EOF

nmstatectl apply ovsbr0-nobond.yaml

cat << EOF > ovsbr0-add-bond.yaml
interfaces:
- name: ovs-br0
  type: ovs-bridge
  state: up
  bridge:
    port:
    - name: ovs0
    - name: veth0
    - name: ovsbond0
      link-aggregation:
        mode: balance-slb
        port:
        - name: veth1
        - name: dummy0
- name: dummy0
  type: dummy
  state: up
EOF

nmstatectl apply ovsbr0-add-bond.yaml
nmstatectl show ovs0,dummy0,veth[01] | nmstaetctl apply

Actual results:
2022-02-16 16:41:22,024 root         ERROR    Rollback failed with error Activate profile uuid:2b91fb0b-80bb-4393-b530-8b04bd70bcad iface:veth0 type: ethernet failed: reason=<enum NM_ACTIVE_CONNECTION_STATE_REASON_UNKNOWN of type NM.ActiveConnectionStateReason><enum NM_DEVICE_STATE_REASON_REMOVED of type NM.DeviceStateReason>
Traceback (most recent call last):
  File "/usr/bin/nmstatectl", line 11, in <module>
    load_entry_point('nmstate==1.2.1', 'console_scripts', 'nmstatectl')()
  File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 74, in main
    return args.func(args)
  File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 338, in set
    return apply(args)
  File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 366, in apply
    args.save_to_disk,
  File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 419, in apply_state
    save_to_disk=save_to_disk,
  File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 89, in apply
    _apply_ifaces_state(plugins, net_state, verify_change, save_to_disk)
  File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 122, in _apply_ifaces_state
    plugin.apply_changes(net_state, save_to_disk)
  File "/usr/lib/python3.6/site-packages/libnmstate/nm/plugin.py", line 233, in apply_changes
    NmProfiles(self.context).apply_config(net_state, save_to_disk)
  File "/usr/lib/python3.6/site-packages/libnmstate/nm/profiles.py", line 99, in apply_config
    self._ctx.wait_all_finish()
  File "/usr/lib/python3.6/site-packages/libnmstate/nm/context.py", line 217, in wait_all_finish
    raise tmp_error
libnmstate.error.NmstateLibnmError: Activate profile uuid:50fefd30-ae5e-4220-81bd-79158d02fafd iface:veth1 type: ethernet failed: reason=<enum NM_ACTIVE_CONNECTION_STATE_REASON_UNKNOWN of type NM.ActiveConnectionStateReason><enum NM_DEVICE_STATE_REASON_REMOVED of type NM.DeviceStateReason>

Expected results:
No failure

Additional info:
According to my test:
1. in the last step, only including all of "ovs0,dummy0,veth[01]" can reproduce
2. repeating the last step for many times, there is a chance to get passed. After that, even you clean up the environment and run the reproducer again, it will get passed too. But once you run "systemctl restart NetworkManager", then it can be reproduced again.

Comment 1 Mingyu Shi 2022-02-17 03:11:50 UTC

A maybe typical usage is changing MTU:
nmstatectl show ovs0,dummy0,veth[01] | sed 's/mtu: 1500/mtu: 1280/g' | nmstaetctl apply

it also failed

Comment 3 Gris Ge 2022-10-08 08:13:14 UTC

Hi Mingyu,

This is fixed by nmstate-1.3.3-1.el8 (RHEL 8.7).

Please check again and close as current release if works for you!

Comment 10 Beniamino Galvani 2023-03-01 15:38:56 UTC

I've found the root cause, and it's another race condition. Here veth1 is re-activated:

  <info>  [1677681069.0515] device (veth1): state change: activated -> deactivating (reason 'new-activation', sys-iface-state: 'managed')

When veth1 transitions to state deactivating, it's detached from the ovs port and a "del-interface" command is sent to ovsdb:

  <debug> [1677681069.0519] device[a0fcc72049c189dc] (ovsbond0): slave veth1 state change 100 (activated) -> 110 (deactivating)
  <trace> [1677681069.0519] device[a0fcc72049c189dc] (ovsbond0): master: release one slave 1cf98085badeca4e/veth1 (enslaved) (configure)
  <info>  [1677681069.0519] device (ovsbond0): detaching ovs interface veth1
  <trace> [1677681069.0519] ovsdb: call[3a31981dc40a5f15]: new: del-interface interface=veth1

In the meantime, veth1 re-activation goes through different state changes:

  <info>  [1677681069.1280] device (veth1): state change: deactivating -> disconnected (reason 'new-activation', sys-iface-state: 'managed')
  <info>  [1677681069.1375] device (veth1): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
  <info>  [1677681069.1380] device (veth1): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')

At this point the "del-interface" event is signaled by ovsdb and this spoils the party:

  <trace> [1677681069.1390] ovsdb: obj[iface:6a322fb2-b25d-4dee-9d2e-dab25b9f49ee]: removed an 'system' interface: veth1, eea3e660-b1dc-4066-af83-30622b3db641
  <info>  [1677681069.1390] device (veth1): state change: config -> deactivating (reason 'removed', sys-iface-state: 'managed')

Normally, the bug doesn't happen because the ovs reply comes before the device changes state; only when ovs is slow to reply the issue is visible.

To fix this, ideally NM should wait that the ovs command terminates before moving the device from deactivating to disconnected.

Comment 17 Vladimir Benes 2023-05-09 13:10:54 UTC

working well in NetworkManager-1.43.7-1.el9.x86_64

working on a NMCI test now

Comment 21 errata-xmlrpc 2023-11-07 08:37:53 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (NetworkManager bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:6585

Note You need to log in before you can comment on or make changes to this bug.