Created attachment 1837475 [details] worker-1-NetworkManager.log Description of problem: Creating the following NodeNetworkConfigurationPolicy resource fails: $ cat 01-vlan-interface-nncp.yml apiVersion: nmstate.io/v1beta1 kind: NodeNetworkConfigurationPolicy metadata: name: ontap-vlan-interface spec: nodeSelector: kubernetes.io/hostname: worker-1 desiredState: interfaces: - name: "eno5np0.350" description: vlan interface for NetApp ONTAP storage. type: vlan state: up ipv4: enabled: true dhcp: true auto-gateway: false vlan: base-iface: "eno5np0" id: 350 Error: Traceback (most recent call last): File "/usr/bin/nmstatectl", line 11, in <module> load_entry_point('nmstate==1.0.2', 'console_scripts', 'nmstatectl')() File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 73, in main return args.func(args) File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 326, in set return apply(args) File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 354, in apply args.save_to_disk, File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 407, in apply_state save_to_disk=save_to_disk, File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 81, in apply _apply_ifaces_state(plugins, net_state, verify_change, save_to_disk) File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 114, in _apply_ifaces_state plugin.apply_changes(net_state, save_to_disk) File "/usr/lib/python3.6/site-packages/libnmstate/nm/plugin.py", line 233, in apply_changes NmProfiles(self.context).apply_config(net_state, save_to_disk) File "/usr/lib/python3.6/site-packages/libnmstate/nm/profiles.py", line 95, in apply_config self._ctx.wait_all_finish() File "/usr/lib/python3.6/site-packages/libnmstate/nm/context.py", line 213, in wait_all_finish raise tmp_error libnmstate.error.NmstateLibnmError: Activate profile uuid:9c545b9c-3233-4863-af07-5113d9eaaf46 iface:eno5np0.350 type: vlan failed: error=nm-manager-error-quark: Failed to find a compatible device for this connection (3) Version-Release number of selected component (if applicable): $ oc version Client Version: 4.8.17 Server Version: 4.8.17 Kubernetes Version: v1.21.1+6438632 $ oc get csv -n openshift-cnv NAME DISPLAY VERSION REPLACES PHASE kubevirt-hyperconverged-operator.v4.8.2 OpenShift Virtualization 4.8.2 kubevirt-hyperconverged-operator.v4.8.1 Succeeded $ oc -n openshift-cnv exec -it nmstate-handler-8c9z8 -- rpm -q nmstate nmstate-1.0.2-14.el8_4.noarch [core@worker-1 ~]$ rpm -qa | grep NetworkManager NetworkManager-libnm-1.30.0-10.el8_4.x86_64 NetworkManager-cloud-setup-1.30.0-10.el8_4.x86_64 NetworkManager-1.30.0-10.el8_4.x86_64 NetworkManager-ovs-1.30.0-10.el8_4.x86_64 NetworkManager-team-1.30.0-10.el8_4.x86_64 NetworkManager-tui-1.30.0-10.el8_4.x86_64 How reproducible: 100% Steps to Reproduce: - Create the vlan interface with the data above. $ oc create -f 01-vlan-interface-nncp.yml nodenetworkconfigurationpolicy.nmstate.io/ontap-vlan-interface created - Validate the nncp status $ oc get nncp NAME STATUS ontap-vlan-interface FailedToConfigure - Verify the enactment of the node $ oc get nnce NAME STATUS master-0.ontap-vlan-interface NodeSelectorNotMatching master-1.ontap-vlan-interface NodeSelectorNotMatching master-2.ontap-vlan-interface NodeSelectorNotMatching worker-0.ontap-vlan-interface NodeSelectorNotMatching worker-1.ontap-vlan-interface FailedToConfigure worker-2.ontap-vlan-interface NodeSelectorNotMatching worker-3.ontap-vlan-interface NodeSelectorNotMatching - Get more details of the error. ( see attached log worker-1.ontap-vlan-interface_nnce.log ) $ oc get nnce worker-1.ontap-vlan-interface -o jsonpath='{.status.conditions[?(@.type=="Failing")].message}' - From the NetworkManager logs in the node you'll see similar messages (Attached log worker-1-NetworkManager.log) [core@worker-1 ~]$ sudo journalctl -u NetworkManager -f - Logs from the nmstate-handler pod of the node, also show similar messages (Attached log worker-1-nmstate-handler.log) Actual results: - VLAN is not setup through NMState policy - If we setup it manually with nmcli (just to test) it works: [core@worker-1 ~]$ nmcli c NAME UUID TYPE DEVICE ovs-if-br-ex bea2fb68-e23d-40f3-8191-2b77caae4f71 ovs-interface br-ex Wired Connection e8764ffe-444b-41e0-bace-4fc6a07756ab ethernet eno5np0 Wired Connection e8764ffe-444b-41e0-bace-4fc6a07756ab ethernet enp1s0f4u4 br-ex 458e00ef-79fb-4a62-b5eb-18d3412ba5bf ovs-bridge br-ex ovs-if-phys0 4b9e1ed3-bee0-4b12-836a-c7e498750368 ethernet ens3f1 ovs-port-br-ex 1ad3bd72-baee-4776-bb00-941fc0f72f4f ovs-port br-ex ovs-port-phys0 61cf1f80-e6f4-4767-9296-de7aa1936c98 ovs-port ens3f1 Wired Connection 809bf544-3bbf-4f7d-849f-96c0e4a28f8e ethernet -- [core@worker-1 ~]$ sudo nmcli con add type vlan con-name eno5np0.350 ifname eno5np0.350 dev eno5np0 id 350 Connection 'eno5np0.350' (8c519760-10c2-4cd6-a3b3-28ccbfcf44a4) successfully added. [core@worker-1 ~]$ nmcli c NAME UUID TYPE DEVICE ovs-if-br-ex bea2fb68-e23d-40f3-8191-2b77caae4f71 ovs-interface br-ex eno5np0.350 8c519760-10c2-4cd6-a3b3-28ccbfcf44a4 vlan eno5np0.350 Wired Connection e8764ffe-444b-41e0-bace-4fc6a07756ab ethernet eno5np0 Wired Connection e8764ffe-444b-41e0-bace-4fc6a07756ab ethernet enp1s0f4u4 br-ex 458e00ef-79fb-4a62-b5eb-18d3412ba5bf ovs-bridge br-ex ovs-if-phys0 4b9e1ed3-bee0-4b12-836a-c7e498750368 ethernet ens3f1 ovs-port-br-ex 1ad3bd72-baee-4776-bb00-941fc0f72f4f ovs-port br-ex ovs-port-phys0 61cf1f80-e6f4-4767-9296-de7aa1936c98 ovs-port ens3f1 Wired Connection 809bf544-3bbf-4f7d-849f-96c0e4a28f8e ethernet -- [core@worker-1 ~]$ ip a s eno5np0.350 37: eno5np0.350@eno5np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether f4:03:43:cc:c1:90 brd ff:ff:ff:ff:ff:ff inet 192.168.53.21/24 brd 192.168.53.255 scope global dynamic noprefixroute eno5np0.350 valid_lft 7186sec preferred_lft 7186sec inet6 fe80::a58f:9bc8:c440:aaee/64 scope link noprefixroute valid_lft forever preferred_lft forever Expected results: NNCP should setup the VLAN interface Additional info: - I setup a NodeSelector just to test one server, but it fails too when applying to all nodes. - I tested with v1alpha1 version too, with same results - I tried applying the nncp a few dozens of times and one node was successful, but it stops because it fail in the others.
- We confirmed the version of NetworkManager is the same in the host and in the nmstate containers $ oc -n openshift-cnv get pods -o wide | grep nmstate | grep worker-1 nmstate-handler-8c9z8 1/1 Running 0 31h 192.168.52.21 worker-1 <none> <none> $ oc -n openshift-cnv exec -it nmstate-handler-8c9z8 -- rpm -q NetworkManager NetworkManager-1.30.0-10.el8_4.x86_64 $ ssh core@worker-1 "rpm -q NetworkManager" NetworkManager-1.30.0-10.el8_4.x86_64 - We also enabled debug in NetworkManager, attached part of the log when we applied the policy. (see NetworkManager-Debug-worker-1.log)
Looks like a pair of devices has the same NM connection UUID Wired Connection e8764ffe-444b-41e0-bace-4fc6a07756ab ethernet eno5np0 Wired Connection e8764ffe-444b-41e0-bace-4fc6a07756ab ethernet enp1s0f4u4
This is a dup of https://bugzilla.redhat.com/show_bug.cgi?id=2008446, marking as so.
*** This bug has been marked as a duplicate of bug 2008446 ***
Reopening since look like duplicated UUID is not an issue when we have configuration like ``` [connection] id=Wired Connection uuid=4258b2a5-d6dd-4ea5-8c20-9eb5c211d14c type=ethernet autoconnect-retries=1 multi-connect=3 permissions= wait-device-timeout=60000 ```
Hi Quique, Thanks for re-opening this BZ. Additionally to my tests in private comment of BZ#2008446 we tested a new OCP 4.8 deployment with kubevirt-hyperconverged-operator v2.6.8 from Catalog Source of 4.7 since we are in a disconnected environment, and we confirmed with this combination we do not have issues, it's just when we install the operator kubevirt-hyperconverged-operator v4.8.2 from Catalog Source 4.8. Again, if there is any output of tests you would like us to perform, feel free to reach out! Thanks Manuel
(In reply to Manuel Rodriguez from comment #10) > Hi Quique, > > Thanks for re-opening this BZ. Additionally to my tests in private comment > of BZ#2008446 we tested a new OCP 4.8 deployment with > kubevirt-hyperconverged-operator v2.6.8 from Catalog Source of 4.7 since we > are in a disconnected environment, and we confirmed with this combination we > do not have issues, it's just when we install the operator > kubevirt-hyperconverged-operator v4.8.2 from Catalog Source 4.8. > > Again, if there is any output of tests you would like us to perform, feel > free to reach out! > > Thanks > Manuel Hey Manuel, Can you try marking the parent interface with state up ? apiVersion: nmstate.io/v1beta1 kind: NodeNetworkConfigurationPolicy metadata: name: ontap-vlan-interface spec: nodeSelector: kubernetes.io/hostname: worker-1 desiredState: interfaces: - name: eno5np0 state: up type: ethernet - name: eno5np0.350 description: vlan interface for NetApp ONTAP storage. type: vlan state: up ipv4: enabled: true dhcp: true auto-gateway: false vlan: base-iface: "eno5np0" id: 350
@manrodri even the NNS before applying the NNCP would be nice to have.
@ellorent TL;DR I did some tests with your suggestions, it seems to work this time, but I need to do more tests. Because I already had a cluster with eno5np0.350 interfaces setup via MC. I created a different interface with NMstate - Initial setup [kni.dfwt5g.lab manny]$ oc version Client Version: 4.8.26 Server Version: 4.8.26 Kubernetes Version: v1.21.6+bb8d50a [kni.dfwt5g.lab manny]$ oc get csv -n openshift-cnv NAME DISPLAY VERSION REPLACES PHASE kubevirt-hyperconverged-operator.v4.8.3 OpenShift Virtualization 4.8.3 kubevirt-hyperconverged-operator.v4.8.2 Succeeded performance-addon-operator.v4.8.5 Performance Addon Operator 4.8.5 Succeeded $ oc -n openshift-cnv get pods -o wide | grep nmstate | grep worker-0 nmstate-handler-vtqtz 1/1 Running 0 42m 192.168.52.20 worker-0 <none> <none> $ oc -n openshift-cnv exec -it nmstate-handler-vtqtz -- rpm -q NetworkManager NetworkManager-1.30.0-13.el8_4.x86_64 $ ssh core@worker-0 "rpm -q NetworkManager" NetworkManager-1.30.0-13.el8_4.x86_64 [kni.dfwt5g.lab manny]$ oc get nns NAME AGE master-0 5m16s master-1 5m16s master-2 5m16s worker-0 5m14s worker-1 5m16s worker-2 5m15s worker-3 5m14s [kni.dfwt5g.lab manny]$ oc get nncp No resources found - I created the following interface on three different workers, one by one I modified the name/hostname accordingly, I got the same failure as the initial comments on this BZ, then I deleted them: $ cat ha-worker2.yml apiVersion: nmstate.io/v1alpha1 kind: NodeNetworkConfigurationPolicy metadata: name: bigip-ha-policy-worker-2 spec: nodeSelector: kubernetes.io/hostname: worker-2 desiredState: interfaces: - name: eno5np0.546 description: vlan interface per https://github.com/nmstate/kubernetes-nmstate/blob/master/docs/user-guide-policy-configure-vlan-and-dynamic-ip.md type: vlan state: up vlan: base-iface: eno5np0 id: 546 - name: bigip-ha description: "Linux bridge with eno5np0 vlan 546 for non-routable BigIP-HA" type: linux-bridge state: up bridge: options: stp: enabled: false port: - name: eno5np0.546 $ oc get nncp NAME STATUS bigip-ha-policy-worker-0 FailedToConfigure bigip-ha-policy-worker-1 FailedToConfigure bigip-ha-policy-worker-2 FailedToConfigure - Next, I added the config of the parent interface to bring it up, and created the nncps. this time it worked - name: eno5np0 state: up type: ethernet $ oc get nncp NAME STATUS bigip-ha-policy-worker-0 SuccessfullyConfigured bigip-ha-policy-worker-1 SuccessfullyConfigured bigip-ha-policy-worker-2 SuccessfullyConfigured I'll run a test with the initial interface during the weekend to make sure this works and update back. Thanks, Manuel
@ellorent I redeployed the lab and this time setup the eno5np0.350 interface via NMstate. I confirmed this as my previous comment First I tried without bringing the parent up (although is up, because that NIC is used by the provisioning network), but it failed with the described error. - Without bringing the interface parent up: $ vi 01-vlan-interface-nncp-all.yml apiVersion: nmstate.io/v1beta1 kind: NodeNetworkConfigurationPolicy metadata: name: ontap-vlan-interface spec: nodeSelector: node-role.kubernetes.io/worker: "" desiredState: interfaces: - name: eno5np0.350 description: vlan interface for NetApp ONTAP storage. type: vlan state: up ipv4: enabled: true dhcp: true auto-gateway: false vlan: base-iface: "eno5np0" id: 350 $ oc create -f 01-vlan-interface-nncp-all.yml nodenetworkconfigurationpolicy.nmstate.io/ontap-vlan-interface created $ oc get nncp NAME STATUS ontap-vlan-interface FailedToConfigure $ oc get nnce NAME STATUS master-0.ontap-vlan-interface NodeSelectorNotMatching master-1.ontap-vlan-interface NodeSelectorNotMatching master-2.ontap-vlan-interface NodeSelectorNotMatching worker-0.ontap-vlan-interface FailedToConfigure worker-1.ontap-vlan-interface ConfigurationAborted worker-2.ontap-vlan-interface ConfigurationAborted worker-3.ontap-vlan-interface ConfigurationAborted $ oc delete -f 01-vlan-interface-nncp-all.yml nodenetworkconfigurationpolicy.nmstate.io "ontap-vlan-interface" deleted - Bringing the parent interface up: $ vi 01-vlan-interface-nncp-all.yml apiVersion: nmstate.io/v1beta1 kind: NodeNetworkConfigurationPolicy metadata: name: ontap-vlan-interface spec: nodeSelector: node-role.kubernetes.io/worker: "" desiredState: interfaces: - name: eno5np0 state: up type: ethernet - name: eno5np0.350 description: vlan interface for NetApp ONTAP storage. type: vlan state: up ipv4: enabled: true dhcp: true auto-gateway: false vlan: base-iface: "eno5np0" id: 350 $ oc get nncp No resources found $ oc create -f 01-vlan-interface-nncp-all.yml nodenetworkconfigurationpolicy.nmstate.io/ontap-vlan-interface created $ oc get nncp NAME STATUS ontap-vlan-interface SuccessfullyConfigured $ oc get nnce NAME STATUS master-0.ontap-vlan-interface NodeSelectorNotMatching master-1.ontap-vlan-interface NodeSelectorNotMatching master-2.ontap-vlan-interface NodeSelectorNotMatching worker-0.ontap-vlan-interface SuccessfullyConfigured worker-1.ontap-vlan-interface SuccessfullyConfigured worker-2.ontap-vlan-interface SuccessfullyConfigured worker-3.ontap-vlan-interface SuccessfullyConfigured I think we can close this BZ, I'll create a new one if found a new issue. Thanks a lot!
Thanks for the update Manuel. I will keep this BZ open until we file a follow-up for either our documentation to mention this requirement or for NetworkManager to make it possible to configure VLAN without managing the underlying interface.
Query to the nmstate team: https://bugzilla.redhat.com/show_bug.cgi?id=2058292. If the RFE gets rejected, we'd document the requirement to manage the underlying interface instead.
Changing the target to future for now, as the RFE this depends on is yet to be targetted
*** Bug 2058514 has been marked as a duplicate of this bug. ***
This should be resolved once kubernetes-nmstate starts using nmstate 2 (probably 4.14). https://bugzilla.redhat.com/show_bug.cgi?id=2058292
4.14 nmstate operator should be now using nmstate 2.0 RPM. I don't know how to get the relevant nmstate bundle version. After deploying 4.14, please first check `rpm -qi nmstate` in the nmstate-handler Pod before verifying this BZ. Sorry for the inconvenience.
Verified on v4.14 (PSI cluster): ➜ oc get clusterversion git:(wip_test_psi_cluster|✚1…3⚑24 NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.14.0-ec.3 True False 43h Cluster version is 4.14.0-ec.3 ➜ oc get csv -A | grep virt git:(wip_test_psi_cluster|✚1…3⚑24 openshift-cnv kubevirt-hyperconverged-operator.v4.14.0 OpenShift Virtualization 4.14.0 kubevirt-hyperconverged-operator.v4.13.3 Succeeded ➜ oc exec -it pod/nmstate-handler-n25cz -n openshift-nmstate -- bash git:(wip_test_psi_cluster|✚1…3⚑24 [root@infd-multi-nics-znplw-worker-0-lrbxf /]# rpm -qi nmstate Name : nmstate Version : 2.2.12 Release : 1.el9_2 Architecture: x86_64 Install Date: Fri Jul 21 18:12:03 2023 Group : Unspecified Size : 9736389 License : LGPLv2+ Signature : RSA/SHA256, Wed Jun 7 11:00:11 2023, Key ID 199e2f91fd431d51 Source RPM : nmstate-2.2.12-1.el9_2.src.rpm Build Date : Wed Jun 7 09:38:53 2023 Build Host : x86-64-01.build.eng.rdu2.redhat.com Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> Vendor : Red Hat, Inc. URL : https://github.com/nmstate/nmstate Summary : Declarative network manager API Description : Nmstate is a library with an accompanying command line tool that manages host networking settings in a declarative manner and aimed to satisfy enterprise needs to manage host networking through a northbound declarative API and multi provider support on the southbound. NNCP created successfully: apiVersion: nmstate.io/v1 kind: NodeNetworkConfigurationPolicy metadata: name: ens9.1000-nncp spec: desiredState: interfaces: - ipv4: auto-dns: true dhcp: false enabled: false ipv6: auto-dns: true autoconf: false dhcp: false enabled: false name: ens9.1000 state: up type: vlan vlan: base-iface: ens9 id: 1000 nodeSelector: node-type: worker
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Virtualization 4.14.0 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:6817