Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Created attachment 1861388[details]
NM_trace.log
Description of problem:
Failed to apply the current state of ovs.
Version-Release number of selected component (if applicable):
nmstate-1.2.1-1.el8.x86_64
nispor-1.2.3-1.el8.x86_64
NetworkManager-1.36.0-0.7.el8.x86_64
openvswitch2.11-2.11.3-93.el8fdp.x86_64
How reproducible:
100%
Steps to Reproduce:
ip link add veth0 type veth peer name veth0_ep
ip link set veth0 up
ip link set veth0_ep up
nmcli device set veth0 managed yes
nmcli device set veth0_ep managed yes
ip link add veth1 type veth peer name veth1_ep
ip link set veth1 up
ip link set veth1_ep up
nmcli device set veth1 managed yes
nmcli device set veth1_ep managed yes
cat << EOF > ovsbr0-nobond.yaml
interfaces:
- name: ovs-br0
type: ovs-bridge
state: up
bridge:
port:
- name: ovs0
- name: veth0
- name: ovs0
type: ovs-interface
state: up
ipv4:
enabled: true
address:
- ip: 1.1.1.1
prefix-length: 24
ipv6:
enabled: true
address:
- ip: 1::1
prefix-length: 64
EOF
nmstatectl apply ovsbr0-nobond.yaml
cat << EOF > ovsbr0-add-bond.yaml
interfaces:
- name: ovs-br0
type: ovs-bridge
state: up
bridge:
port:
- name: ovs0
- name: veth0
- name: ovsbond0
link-aggregation:
mode: balance-slb
port:
- name: veth1
- name: dummy0
- name: dummy0
type: dummy
state: up
EOF
nmstatectl apply ovsbr0-add-bond.yaml
nmstatectl show ovs0,dummy0,veth[01] | nmstaetctl apply
Actual results:
2022-02-16 16:41:22,024 root ERROR Rollback failed with error Activate profile uuid:2b91fb0b-80bb-4393-b530-8b04bd70bcad iface:veth0 type: ethernet failed: reason=<enum NM_ACTIVE_CONNECTION_STATE_REASON_UNKNOWN of type NM.ActiveConnectionStateReason><enum NM_DEVICE_STATE_REASON_REMOVED of type NM.DeviceStateReason>
Traceback (most recent call last):
File "/usr/bin/nmstatectl", line 11, in <module>
load_entry_point('nmstate==1.2.1', 'console_scripts', 'nmstatectl')()
File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 74, in main
return args.func(args)
File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 338, in set
return apply(args)
File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 366, in apply
args.save_to_disk,
File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 419, in apply_state
save_to_disk=save_to_disk,
File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 89, in apply
_apply_ifaces_state(plugins, net_state, verify_change, save_to_disk)
File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 122, in _apply_ifaces_state
plugin.apply_changes(net_state, save_to_disk)
File "/usr/lib/python3.6/site-packages/libnmstate/nm/plugin.py", line 233, in apply_changes
NmProfiles(self.context).apply_config(net_state, save_to_disk)
File "/usr/lib/python3.6/site-packages/libnmstate/nm/profiles.py", line 99, in apply_config
self._ctx.wait_all_finish()
File "/usr/lib/python3.6/site-packages/libnmstate/nm/context.py", line 217, in wait_all_finish
raise tmp_error
libnmstate.error.NmstateLibnmError: Activate profile uuid:50fefd30-ae5e-4220-81bd-79158d02fafd iface:veth1 type: ethernet failed: reason=<enum NM_ACTIVE_CONNECTION_STATE_REASON_UNKNOWN of type NM.ActiveConnectionStateReason><enum NM_DEVICE_STATE_REASON_REMOVED of type NM.DeviceStateReason>
Expected results:
No failure
Additional info:
According to my test:
1. in the last step, only including all of "ovs0,dummy0,veth[01]" can reproduce
2. repeating the last step for many times, there is a chance to get passed. After that, even you clean up the environment and run the reproducer again, it will get passed too. But once you run "systemctl restart NetworkManager", then it can be reproduced again.
Hi Mingyu,
This is fixed by nmstate-1.3.3-1.el8 (RHEL 8.7).
Please check again and close as current release if works for you!
Comment 10Beniamino Galvani
2023-03-01 15:38:56 UTC
I've found the root cause, and it's another race condition. Here veth1 is re-activated:
<info> [1677681069.0515] device (veth1): state change: activated -> deactivating (reason 'new-activation', sys-iface-state: 'managed')
When veth1 transitions to state deactivating, it's detached from the ovs port and a "del-interface" command is sent to ovsdb:
<debug> [1677681069.0519] device[a0fcc72049c189dc] (ovsbond0): slave veth1 state change 100 (activated) -> 110 (deactivating)
<trace> [1677681069.0519] device[a0fcc72049c189dc] (ovsbond0): master: release one slave 1cf98085badeca4e/veth1 (enslaved) (configure)
<info> [1677681069.0519] device (ovsbond0): detaching ovs interface veth1
<trace> [1677681069.0519] ovsdb: call[3a31981dc40a5f15]: new: del-interface interface=veth1
In the meantime, veth1 re-activation goes through different state changes:
<info> [1677681069.1280] device (veth1): state change: deactivating -> disconnected (reason 'new-activation', sys-iface-state: 'managed')
<info> [1677681069.1375] device (veth1): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
<info> [1677681069.1380] device (veth1): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
At this point the "del-interface" event is signaled by ovsdb and this spoils the party:
<trace> [1677681069.1390] ovsdb: obj[iface:6a322fb2-b25d-4dee-9d2e-dab25b9f49ee]: removed an 'system' interface: veth1, eea3e660-b1dc-4066-af83-30622b3db641
<info> [1677681069.1390] device (veth1): state change: config -> deactivating (reason 'removed', sys-iface-state: 'managed')
Normally, the bug doesn't happen because the ovs reply comes before the device changes state; only when ovs is slow to reply the issue is visible.
To fix this, ideally NM should wait that the ovs command terminates before moving the device from deactivating to disconnected.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (NetworkManager bug fix and enhancement update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2023:6585
Created attachment 1861388 [details] NM_trace.log Description of problem: Failed to apply the current state of ovs. Version-Release number of selected component (if applicable): nmstate-1.2.1-1.el8.x86_64 nispor-1.2.3-1.el8.x86_64 NetworkManager-1.36.0-0.7.el8.x86_64 openvswitch2.11-2.11.3-93.el8fdp.x86_64 How reproducible: 100% Steps to Reproduce: ip link add veth0 type veth peer name veth0_ep ip link set veth0 up ip link set veth0_ep up nmcli device set veth0 managed yes nmcli device set veth0_ep managed yes ip link add veth1 type veth peer name veth1_ep ip link set veth1 up ip link set veth1_ep up nmcli device set veth1 managed yes nmcli device set veth1_ep managed yes cat << EOF > ovsbr0-nobond.yaml interfaces: - name: ovs-br0 type: ovs-bridge state: up bridge: port: - name: ovs0 - name: veth0 - name: ovs0 type: ovs-interface state: up ipv4: enabled: true address: - ip: 1.1.1.1 prefix-length: 24 ipv6: enabled: true address: - ip: 1::1 prefix-length: 64 EOF nmstatectl apply ovsbr0-nobond.yaml cat << EOF > ovsbr0-add-bond.yaml interfaces: - name: ovs-br0 type: ovs-bridge state: up bridge: port: - name: ovs0 - name: veth0 - name: ovsbond0 link-aggregation: mode: balance-slb port: - name: veth1 - name: dummy0 - name: dummy0 type: dummy state: up EOF nmstatectl apply ovsbr0-add-bond.yaml nmstatectl show ovs0,dummy0,veth[01] | nmstaetctl apply Actual results: 2022-02-16 16:41:22,024 root ERROR Rollback failed with error Activate profile uuid:2b91fb0b-80bb-4393-b530-8b04bd70bcad iface:veth0 type: ethernet failed: reason=<enum NM_ACTIVE_CONNECTION_STATE_REASON_UNKNOWN of type NM.ActiveConnectionStateReason><enum NM_DEVICE_STATE_REASON_REMOVED of type NM.DeviceStateReason> Traceback (most recent call last): File "/usr/bin/nmstatectl", line 11, in <module> load_entry_point('nmstate==1.2.1', 'console_scripts', 'nmstatectl')() File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 74, in main return args.func(args) File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 338, in set return apply(args) File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 366, in apply args.save_to_disk, File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 419, in apply_state save_to_disk=save_to_disk, File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 89, in apply _apply_ifaces_state(plugins, net_state, verify_change, save_to_disk) File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 122, in _apply_ifaces_state plugin.apply_changes(net_state, save_to_disk) File "/usr/lib/python3.6/site-packages/libnmstate/nm/plugin.py", line 233, in apply_changes NmProfiles(self.context).apply_config(net_state, save_to_disk) File "/usr/lib/python3.6/site-packages/libnmstate/nm/profiles.py", line 99, in apply_config self._ctx.wait_all_finish() File "/usr/lib/python3.6/site-packages/libnmstate/nm/context.py", line 217, in wait_all_finish raise tmp_error libnmstate.error.NmstateLibnmError: Activate profile uuid:50fefd30-ae5e-4220-81bd-79158d02fafd iface:veth1 type: ethernet failed: reason=<enum NM_ACTIVE_CONNECTION_STATE_REASON_UNKNOWN of type NM.ActiveConnectionStateReason><enum NM_DEVICE_STATE_REASON_REMOVED of type NM.DeviceStateReason> Expected results: No failure Additional info: According to my test: 1. in the last step, only including all of "ovs0,dummy0,veth[01]" can reproduce 2. repeating the last step for many times, there is a chance to get passed. After that, even you clean up the environment and run the reproducer again, it will get passed too. But once you run "systemctl restart NetworkManager", then it can be reproduced again.