Hide Forgot
Description of problem: Following an 4.6.17 -> 4.6.25 -> 4.7.11 OCP upgrade and CNV 2.5.7 -> 2.6.5 upgrade creating the following NodeNetworkConfigurationPolicy resource fails: --- apiVersion: nmstate.io/v1alpha1 kind: NodeNetworkConfigurationPolicy metadata: name: test-bridges spec: nodeSelector: node-role.kubernetes.io/load-balancer: "" desiredState: interfaces: - name: bond0.400 type: vlan state: up vlan: base-iface: bond0 id: 400 - name: test-mgmt description: Linux bridge with bond0 vlan 400 as a port! type: linux-bridge state: up bridge: options: stp: enabled: false port: - name: bond0.400 - name: bond0.401 type: vlan state: up vlan: base-iface: bond0 id: 401 - name: test-ha description: Linux bridge with bond0 vlan 301 as a port! type: linux-bridge state: up bridge: options: stp: enabled: false port: - name: bond0.401 raise tmp_error\nlibnmstate.error.NmstateLibnmError: Activate profile: bond0.400 failed: error=nm-manager-error-quark: Failed to find a compatible device for this connection Version-Release number of selected component (if applicable): 2.6.5 How reproducible: 100% Steps to Reproduce: 1. Deploy OCP 4.6.17 with CNV and SR-IOV operators 2. Create attached sriovnetworknodepolicy, sriovnetwork, NodeNetworkConfigurationPolicy and VirtualMachine 3. Upgrade OCP to 4.6.25 and then to 4.7.11 4. Upgrade CNV to 2.6.5 5. Create the NodeNetworkConfigurationPolicy referenced above Actual results: NodeNetworkConfigurationPolicy fails to create Expected results: NodeNetworkConfigurationPolicy gets created successfully Additional info: Attaching full nmstate pods logs.
This does not seem to be kubernetes-nmstate specific. For context, this is happening after upgrade from RHEL 8.2 to 8.3. This is the full log of the nmstatectl run: {"level":"info","ts":1623238520.8554049,"logger":"enactmentstatus","msg":"status: {DesiredState:interfaces: - name: bond0.400 state: up type: vlan vlan: base-iface: bond0 id: 400 - bridge: options: stp: enabled: false port: - name: bond0.400 description: Linux bridge with bond0 vlan 400 as a port! name: test-mgmt state: up type: linux-bridge - name: bond0.401 state: up type: vlan vlan: base-iface: bond0 id: 401 - bridge: options: stp: enabled: false port: - name: bond0.401 description: Linux bridge with bond0 vlan 301 as a port! name: test-ha state: up type: linux-bridge PolicyGeneration:2 Conditions:[{Type:Failing Status:True Reason:FailedToConfigure Message:error reconciling NodeNetworkConfigurationPolicy at desired state apply: , failed to execute nmstatectl set --no-commit --timeout 480: 'exit status 1' '' '2021-06-09 11:35:20,599 root DEBUG Interface br-ex found. Merging the interface information. 2021-06-09 11:35:20,600 root DEBUG Interface br-ext found. Merging the interface information. 2021-06-09 11:35:20,600 root DEBUG Interface br-int found. Merging the interface information. 2021-06-09 11:35:20,600 root DEBUG Interface br-local found. Merging the interface information. 2021-06-09 11:35:20,646 root WARNING The interface br-ex is setting br-ex as port. Multiple interfaces with names are not supported and unexpected errors may occur. 2021-06-09 11:35:20,652 root DEBUG Async action: Create checkpoint started 2021-06-09 11:35:20,660 root DEBUG Checkpoint None created for all devices 2021-06-09 11:35:20,660 root DEBUG Async action: Create checkpoint finished 2021-06-09 11:35:20,662 root DEBUG Async action: Add profile: bond0.400 started 2021-06-09 11:35:20,663 root DEBUG Async action: Add profile: test-mgmt started 2021-06-09 11:35:20,663 root DEBUG Async action: Add profile: bond0.401 started 2021-06-09 11:35:20,664 root DEBUG Async action: Add profile: test-ha started 2021-06-09 11:35:20,679 root DEBUG Async action: Add profile: bond0.400 finished 2021-06-09 11:35:20,680 root DEBUG Async action: Add profile: test-mgmt finished 2021-06-09 11:35:20,680 root DEBUG Async action: Add profile: bond0.401 finished 2021-06-09 11:35:20,680 root DEBUG Async action: Add profile: test-ha finished 2021-06-09 11:35:20,680 root DEBUG Async action: Activate profile: test-ha started 2021-06-09 11:35:20,680 root DEBUG Async action: Activate profile: test-mgmt started 2021-06-09 11:35:20,692 root DEBUG Connection activation initiated: dev=test-ha, con-state=<enum NM_ACTIVE_CONNECTION_STATE_ACTIVATING of type NM.ActiveConnectionState> 2021-06-09 11:35:20,702 root DEBUG Connection activation initiated: dev=test-mgmt, con-state=<enum NM_ACTIVE_CONNECTION_STATE_ACTIVATING of type NM.ActiveConnectionState> 2021-06-09 11:35:20,744 root DEBUG Connection activation succeeded: dev=test-ha, con-state=<enum NM_ACTIVE_CONNECTION_STATE_ACTIVATING of type NM.ActiveConnectionState>, dev-state=<enum NM_DEVICE_STATE_IP_CONFIG of type NM.DeviceState>, state-flags=<flags NM_ACTIVATION_STATE_FLAG_IS_MASTER | NM_ACTIVATION_STATE_FLAG_LAYER2_READY | NM_ACTIVATION_STATE_FLAG_MASTER_HAS_SLAVES of type NM.ActivationStateFlags> 2021-06-09 11:35:20,745 root DEBUG Async action: Activate profile: test-ha finished 2021-06-09 11:35:20,745 root DEBUG Connection activation succeeded: dev=test-mgmt, con-state=<enum NM_ACTIVE_CONNECTION_STATE_ACTIVATING of type NM.ActiveConnectionState>, dev-state=<enum NM_DEVICE_STATE_IP_CONFIG of type NM.DeviceState>, state-flags=<flags NM_ACTIVATION_STATE_FLAG_IS_MASTER | NM_ACTIVATION_STATE_FLAG_LAYER2_READY | NM_ACTIVATION_STATE_FLAG_MASTER_HAS_SLAVES of type NM.ActivationStateFlags> 2021-06-09 11:35:20,745 root DEBUG Async action: Activate profile: test-mgmt finished 2021-06-09 11:35:20,746 root DEBUG Async action: Activate profile: bond0.400 started 2021-06-09 11:35:20,746 root DEBUG Async action: Activate profile: bond0.401 started 2021-06-09 11:35:20,817 root DEBUG Action Activate profile: bond0.400 failed, trying again. 2021-06-09 11:35:20,818 root DEBUG Action Activate profile: bond0.401 failed, trying again. 2021-06-09 11:35:20,820 root DEBUG Async action: Rollback to checkpoint /org/freedesktop/NetworkManager/Checkpoint/2 started 2021-06-09 11:35:20,821 root ERROR Rollback failed with error Activate profile: bond0.401 failed: error=g-io-error-quark: Operation was cancelled (19) Traceback (most recent call last): File \"/usr/bin/nmstatectl\", line 11, in <module> load_entry_point('nmstate==0.3.4', 'console_scripts', 'nmstatectl')() File \"/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py\", line 67, in main return args.func(args) File \"/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py\", line 267, in apply args.save_to_disk, File \"/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py\", line 289, in apply_state save_to_disk=save_to_disk, File \"/usr/lib/python3.6/site-packages/libnmstate/netapplier.py\", line 73, in apply _apply_ifaces_state(plugins, net_state, verify_change, save_to_disk) File \"/usr/lib/python3.6/site-packages/libnmstate/netapplier.py\", line 106, in _apply_ifaces_state plugin.apply_changes(net_state, save_to_disk) File \"/usr/lib/python3.6/site-packages/libnmstate/nm/plugin.py\", line 174, in apply_changes nm_applier.apply_changes(self.context, net_state, save_to_disk) File \"/usr/lib/python3.6/site-packages/libnmstate/nm/applier.py\", line 183, in apply_changes _set_ifaces_admin_state(context, ifaces_desired_state, con_profiles) File \"/usr/lib/python3.6/site-packages/libnmstate/nm/applier.py\", line 374, in _set_ifaces_admin_state context.wait_all_finish() File \"/usr/lib/python3.6/site-packages/libnmstate/nm/context.py\", line 215, in wait_all_finish raise tmp_error libnmstate.error.NmstateLibnmError: Activate profile: bond0.400 failed: error=nm-manager-error-quark: Failed to find a compatible device for this connection (3)
Let me try to recapitulate what we found. Our understanding is that this bug gets triggered by a configuration done via OVN kubernetes' MCO script setting up host network. This script leaves a duplicate NetworkManager profile for bond0. That triggers a bug in nmstate. Kudos to Gris for finding the root cause. He's now working on a fix on the nmstate side.
Patch posted to upstream https://github.com/nmstate/nmstate/pull/1631 for nmstate-0.3 branch(RHEL 8.3). The problem does not exists in nmstate-1.0(RHEL 8.4) or later version. Using this bug to tracking the effort of RHEL 8.3 and QE effort on downstream testing in RHEL 8.5+.
Verified with versions: nmstate-1.0.2-13.el8_4.noarch nispor-1.0.1-4.el8.x86_64 NetworkManager-1.30.0-7.el8.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (nmstate bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4157