Created attachment 1861388 [details] NM_trace.log Description of problem: Failed to apply the current state of ovs. Version-Release number of selected component (if applicable): nmstate-1.2.1-1.el8.x86_64 nispor-1.2.3-1.el8.x86_64 NetworkManager-1.36.0-0.7.el8.x86_64 openvswitch2.11-2.11.3-93.el8fdp.x86_64 How reproducible: 100% Steps to Reproduce: ip link add veth0 type veth peer name veth0_ep ip link set veth0 up ip link set veth0_ep up nmcli device set veth0 managed yes nmcli device set veth0_ep managed yes ip link add veth1 type veth peer name veth1_ep ip link set veth1 up ip link set veth1_ep up nmcli device set veth1 managed yes nmcli device set veth1_ep managed yes cat << EOF > ovsbr0-nobond.yaml interfaces: - name: ovs-br0 type: ovs-bridge state: up bridge: port: - name: ovs0 - name: veth0 - name: ovs0 type: ovs-interface state: up ipv4: enabled: true address: - ip: 1.1.1.1 prefix-length: 24 ipv6: enabled: true address: - ip: 1::1 prefix-length: 64 EOF nmstatectl apply ovsbr0-nobond.yaml cat << EOF > ovsbr0-add-bond.yaml interfaces: - name: ovs-br0 type: ovs-bridge state: up bridge: port: - name: ovs0 - name: veth0 - name: ovsbond0 link-aggregation: mode: balance-slb port: - name: veth1 - name: dummy0 - name: dummy0 type: dummy state: up EOF nmstatectl apply ovsbr0-add-bond.yaml nmstatectl show ovs0,dummy0,veth[01] | nmstaetctl apply Actual results: 2022-02-16 16:41:22,024 root ERROR Rollback failed with error Activate profile uuid:2b91fb0b-80bb-4393-b530-8b04bd70bcad iface:veth0 type: ethernet failed: reason=<enum NM_ACTIVE_CONNECTION_STATE_REASON_UNKNOWN of type NM.ActiveConnectionStateReason><enum NM_DEVICE_STATE_REASON_REMOVED of type NM.DeviceStateReason> Traceback (most recent call last): File "/usr/bin/nmstatectl", line 11, in <module> load_entry_point('nmstate==1.2.1', 'console_scripts', 'nmstatectl')() File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 74, in main return args.func(args) File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 338, in set return apply(args) File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 366, in apply args.save_to_disk, File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 419, in apply_state save_to_disk=save_to_disk, File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 89, in apply _apply_ifaces_state(plugins, net_state, verify_change, save_to_disk) File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 122, in _apply_ifaces_state plugin.apply_changes(net_state, save_to_disk) File "/usr/lib/python3.6/site-packages/libnmstate/nm/plugin.py", line 233, in apply_changes NmProfiles(self.context).apply_config(net_state, save_to_disk) File "/usr/lib/python3.6/site-packages/libnmstate/nm/profiles.py", line 99, in apply_config self._ctx.wait_all_finish() File "/usr/lib/python3.6/site-packages/libnmstate/nm/context.py", line 217, in wait_all_finish raise tmp_error libnmstate.error.NmstateLibnmError: Activate profile uuid:50fefd30-ae5e-4220-81bd-79158d02fafd iface:veth1 type: ethernet failed: reason=<enum NM_ACTIVE_CONNECTION_STATE_REASON_UNKNOWN of type NM.ActiveConnectionStateReason><enum NM_DEVICE_STATE_REASON_REMOVED of type NM.DeviceStateReason> Expected results: No failure Additional info: According to my test: 1. in the last step, only including all of "ovs0,dummy0,veth[01]" can reproduce 2. repeating the last step for many times, there is a chance to get passed. After that, even you clean up the environment and run the reproducer again, it will get passed too. But once you run "systemctl restart NetworkManager", then it can be reproduced again.
A maybe typical usage is changing MTU: nmstatectl show ovs0,dummy0,veth[01] | sed 's/mtu: 1500/mtu: 1280/g' | nmstaetctl apply it also failed
Hi Mingyu, This is fixed by nmstate-1.3.3-1.el8 (RHEL 8.7). Please check again and close as current release if works for you!
I've found the root cause, and it's another race condition. Here veth1 is re-activated: <info> [1677681069.0515] device (veth1): state change: activated -> deactivating (reason 'new-activation', sys-iface-state: 'managed') When veth1 transitions to state deactivating, it's detached from the ovs port and a "del-interface" command is sent to ovsdb: <debug> [1677681069.0519] device[a0fcc72049c189dc] (ovsbond0): slave veth1 state change 100 (activated) -> 110 (deactivating) <trace> [1677681069.0519] device[a0fcc72049c189dc] (ovsbond0): master: release one slave 1cf98085badeca4e/veth1 (enslaved) (configure) <info> [1677681069.0519] device (ovsbond0): detaching ovs interface veth1 <trace> [1677681069.0519] ovsdb: call[3a31981dc40a5f15]: new: del-interface interface=veth1 In the meantime, veth1 re-activation goes through different state changes: <info> [1677681069.1280] device (veth1): state change: deactivating -> disconnected (reason 'new-activation', sys-iface-state: 'managed') <info> [1677681069.1375] device (veth1): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed') <info> [1677681069.1380] device (veth1): state change: prepare -> config (reason 'none', sys-iface-state: 'managed') At this point the "del-interface" event is signaled by ovsdb and this spoils the party: <trace> [1677681069.1390] ovsdb: obj[iface:6a322fb2-b25d-4dee-9d2e-dab25b9f49ee]: removed an 'system' interface: veth1, eea3e660-b1dc-4066-af83-30622b3db641 <info> [1677681069.1390] device (veth1): state change: config -> deactivating (reason 'removed', sys-iface-state: 'managed') Normally, the bug doesn't happen because the ovs reply comes before the device changes state; only when ovs is slow to reply the issue is visible. To fix this, ideally NM should wait that the ovs command terminates before moving the device from deactivating to disconnected.
working well in NetworkManager-1.43.7-1.el9.x86_64 working on a NMCI test now