Bug 1969889
| Summary: | Unable to create vlan device after 2.5.7 to 2.6.5 upgrade Error: Activate profile: bond0.400 failed: error=nm-manager-error-quark: Failed to find a compatible device for this connection | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Marius Cornea <mcornea> | |
| Component: | nmstate | Assignee: | Gris Ge <fge> | |
| Status: | CLOSED ERRATA | QA Contact: | Mingyu Shi <mshi> | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | urgent | |||
| Version: | 8.3 | CC: | achernet, cnv-qe-bugs, ealcaniz, ferferna, jiji, jishi, keyoung, network-qe, rkhan, till, yprokule | |
| Target Milestone: | beta | Keywords: | Triaged | |
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
|
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | No Doc Update | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1970056 1970058 (view as bug list) | Environment: | ||
| Last Closed: | 2021-11-09 17:43:51 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1970056, 1970058 | |||
This does not seem to be kubernetes-nmstate specific.
For context, this is happening after upgrade from RHEL 8.2 to 8.3. This is the full log of the nmstatectl run:
{"level":"info","ts":1623238520.8554049,"logger":"enactmentstatus","msg":"status: {DesiredState:interfaces:
- name: bond0.400
state: up
type: vlan
vlan:
base-iface: bond0
id: 400
- bridge:
options:
stp:
enabled: false
port:
- name: bond0.400
description: Linux bridge with bond0 vlan 400 as a port!
name: test-mgmt
state: up
type: linux-bridge
- name: bond0.401
state: up
type: vlan
vlan:
base-iface: bond0
id: 401
- bridge:
options:
stp:
enabled: false
port:
- name: bond0.401
description: Linux bridge with bond0 vlan 301 as a port!
name: test-ha
state: up
type: linux-bridge
PolicyGeneration:2 Conditions:[{Type:Failing Status:True Reason:FailedToConfigure Message:error reconciling NodeNetworkConfigurationPolicy at desired state apply: , failed to execute nmstatectl set --no-commit --timeout 480: 'exit status 1' '' '2021-06-09 11:35:20,599 root DEBUG Interface br-ex found. Merging the interface information.
2021-06-09 11:35:20,600 root DEBUG Interface br-ext found. Merging the interface information.
2021-06-09 11:35:20,600 root DEBUG Interface br-int found. Merging the interface information.
2021-06-09 11:35:20,600 root DEBUG Interface br-local found. Merging the interface information.
2021-06-09 11:35:20,646 root WARNING The interface br-ex is setting br-ex as port. Multiple interfaces with names are not supported and unexpected errors may occur.
2021-06-09 11:35:20,652 root DEBUG Async action: Create checkpoint started
2021-06-09 11:35:20,660 root DEBUG Checkpoint None created for all devices
2021-06-09 11:35:20,660 root DEBUG Async action: Create checkpoint finished
2021-06-09 11:35:20,662 root DEBUG Async action: Add profile: bond0.400 started
2021-06-09 11:35:20,663 root DEBUG Async action: Add profile: test-mgmt started
2021-06-09 11:35:20,663 root DEBUG Async action: Add profile: bond0.401 started
2021-06-09 11:35:20,664 root DEBUG Async action: Add profile: test-ha started
2021-06-09 11:35:20,679 root DEBUG Async action: Add profile: bond0.400 finished
2021-06-09 11:35:20,680 root DEBUG Async action: Add profile: test-mgmt finished
2021-06-09 11:35:20,680 root DEBUG Async action: Add profile: bond0.401 finished
2021-06-09 11:35:20,680 root DEBUG Async action: Add profile: test-ha finished
2021-06-09 11:35:20,680 root DEBUG Async action: Activate profile: test-ha started
2021-06-09 11:35:20,680 root DEBUG Async action: Activate profile: test-mgmt started
2021-06-09 11:35:20,692 root DEBUG Connection activation initiated: dev=test-ha, con-state=<enum NM_ACTIVE_CONNECTION_STATE_ACTIVATING of type NM.ActiveConnectionState>
2021-06-09 11:35:20,702 root DEBUG Connection activation initiated: dev=test-mgmt, con-state=<enum NM_ACTIVE_CONNECTION_STATE_ACTIVATING of type NM.ActiveConnectionState>
2021-06-09 11:35:20,744 root DEBUG Connection activation succeeded: dev=test-ha, con-state=<enum NM_ACTIVE_CONNECTION_STATE_ACTIVATING of type NM.ActiveConnectionState>, dev-state=<enum NM_DEVICE_STATE_IP_CONFIG of type NM.DeviceState>, state-flags=<flags NM_ACTIVATION_STATE_FLAG_IS_MASTER | NM_ACTIVATION_STATE_FLAG_LAYER2_READY | NM_ACTIVATION_STATE_FLAG_MASTER_HAS_SLAVES of type NM.ActivationStateFlags>
2021-06-09 11:35:20,745 root DEBUG Async action: Activate profile: test-ha finished
2021-06-09 11:35:20,745 root DEBUG Connection activation succeeded: dev=test-mgmt, con-state=<enum NM_ACTIVE_CONNECTION_STATE_ACTIVATING of type NM.ActiveConnectionState>, dev-state=<enum NM_DEVICE_STATE_IP_CONFIG of type NM.DeviceState>, state-flags=<flags NM_ACTIVATION_STATE_FLAG_IS_MASTER | NM_ACTIVATION_STATE_FLAG_LAYER2_READY | NM_ACTIVATION_STATE_FLAG_MASTER_HAS_SLAVES of type NM.ActivationStateFlags>
2021-06-09 11:35:20,745 root DEBUG Async action: Activate profile: test-mgmt finished
2021-06-09 11:35:20,746 root DEBUG Async action: Activate profile: bond0.400 started
2021-06-09 11:35:20,746 root DEBUG Async action: Activate profile: bond0.401 started
2021-06-09 11:35:20,817 root DEBUG Action Activate profile: bond0.400 failed, trying again.
2021-06-09 11:35:20,818 root DEBUG Action Activate profile: bond0.401 failed, trying again.
2021-06-09 11:35:20,820 root DEBUG Async action: Rollback to checkpoint /org/freedesktop/NetworkManager/Checkpoint/2 started
2021-06-09 11:35:20,821 root ERROR Rollback failed with error Activate profile: bond0.401 failed: error=g-io-error-quark: Operation was cancelled (19)
Traceback (most recent call last):
File \"/usr/bin/nmstatectl\", line 11, in <module>
load_entry_point('nmstate==0.3.4', 'console_scripts', 'nmstatectl')()
File \"/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py\", line 67, in main
return args.func(args)
File \"/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py\", line 267, in apply
args.save_to_disk,
File \"/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py\", line 289, in apply_state
save_to_disk=save_to_disk,
File \"/usr/lib/python3.6/site-packages/libnmstate/netapplier.py\", line 73, in apply
_apply_ifaces_state(plugins, net_state, verify_change, save_to_disk)
File \"/usr/lib/python3.6/site-packages/libnmstate/netapplier.py\", line 106, in _apply_ifaces_state
plugin.apply_changes(net_state, save_to_disk)
File \"/usr/lib/python3.6/site-packages/libnmstate/nm/plugin.py\", line 174, in apply_changes
nm_applier.apply_changes(self.context, net_state, save_to_disk)
File \"/usr/lib/python3.6/site-packages/libnmstate/nm/applier.py\", line 183, in apply_changes
_set_ifaces_admin_state(context, ifaces_desired_state, con_profiles)
File \"/usr/lib/python3.6/site-packages/libnmstate/nm/applier.py\", line 374, in _set_ifaces_admin_state
context.wait_all_finish()
File \"/usr/lib/python3.6/site-packages/libnmstate/nm/context.py\", line 215, in wait_all_finish
raise tmp_error
libnmstate.error.NmstateLibnmError: Activate profile: bond0.400 failed: error=nm-manager-error-quark: Failed to find a compatible device for this connection (3)
Let me try to recapitulate what we found. Our understanding is that this bug gets triggered by a configuration done via OVN kubernetes' MCO script setting up host network. This script leaves a duplicate NetworkManager profile for bond0. That triggers a bug in nmstate. Kudos to Gris for finding the root cause. He's now working on a fix on the nmstate side. Patch posted to upstream https://github.com/nmstate/nmstate/pull/1631 for nmstate-0.3 branch(RHEL 8.3). The problem does not exists in nmstate-1.0(RHEL 8.4) or later version. Using this bug to tracking the effort of RHEL 8.3 and QE effort on downstream testing in RHEL 8.5+. Verified with versions: nmstate-1.0.2-13.el8_4.noarch nispor-1.0.1-4.el8.x86_64 NetworkManager-1.30.0-7.el8.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (nmstate bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4157 |
Description of problem: Following an 4.6.17 -> 4.6.25 -> 4.7.11 OCP upgrade and CNV 2.5.7 -> 2.6.5 upgrade creating the following NodeNetworkConfigurationPolicy resource fails: --- apiVersion: nmstate.io/v1alpha1 kind: NodeNetworkConfigurationPolicy metadata: name: test-bridges spec: nodeSelector: node-role.kubernetes.io/load-balancer: "" desiredState: interfaces: - name: bond0.400 type: vlan state: up vlan: base-iface: bond0 id: 400 - name: test-mgmt description: Linux bridge with bond0 vlan 400 as a port! type: linux-bridge state: up bridge: options: stp: enabled: false port: - name: bond0.400 - name: bond0.401 type: vlan state: up vlan: base-iface: bond0 id: 401 - name: test-ha description: Linux bridge with bond0 vlan 301 as a port! type: linux-bridge state: up bridge: options: stp: enabled: false port: - name: bond0.401 raise tmp_error\nlibnmstate.error.NmstateLibnmError: Activate profile: bond0.400 failed: error=nm-manager-error-quark: Failed to find a compatible device for this connection Version-Release number of selected component (if applicable): 2.6.5 How reproducible: 100% Steps to Reproduce: 1. Deploy OCP 4.6.17 with CNV and SR-IOV operators 2. Create attached sriovnetworknodepolicy, sriovnetwork, NodeNetworkConfigurationPolicy and VirtualMachine 3. Upgrade OCP to 4.6.25 and then to 4.7.11 4. Upgrade CNV to 2.6.5 5. Create the NodeNetworkConfigurationPolicy referenced above Actual results: NodeNetworkConfigurationPolicy fails to create Expected results: NodeNetworkConfigurationPolicy gets created successfully Additional info: Attaching full nmstate pods logs.