Description of problem: A VM with secondary NIC fails to start. Version-Release number of selected component (if applicable): CNV: brew.registry.redhat.io/rh-osbs/iib:545376 (v4.14.0.rhel9-1403) cluster-network-addons-operator-rhel9:v4.14.0-23 multus-dynamic-networks-rhel9:v4.14.0-23 How reproducible: 100% (observed several times, on different clusters) Steps to Reproduce: 1. Create a linux-bridge on the cluster workers using a NodeNetworkConfigurationPolicy like the attached bridge-nncp.yaml: $ oc apply -f bridge-nncp.yaml nodenetworkconfigurationpolicy.nmstate.io/linux-bridge-nncp created $ $ oc get nncp -w NAME STATUS REASON linux-bridge-nncp linux-bridge-nncp Progressing ConfigurationProgressing ... linux-bridge-nncp Progressing ConfigurationProgressing linux-bridge-nncp Available SuccessfullyConfigured 2. Create a NetworkAttachmentDefinition like the attached bridge-nad.yaml $ oc apply -f bridge-nad.yaml networkattachmentdefinition.k8s.cni.cncf.io/br1test-nad created 3. Create and start a VM like the one of the attached vma.yaml (Please note that I scheduled the VM on a specific node using a nodeSelector, just so I know which journalctl I need to follow for debugging). $ oc apply -f vma.yaml virtualmachine.kubevirt.io/vma created $ $ virtctl start vma VM vma was scheduled to start Actual results: <BUG> A VMI is created, but it is stuck on `Scheduling` state. Expected results: VMI should complete scheduling and run successfully. Additional info: 1. The following failure appears in the virt-launcher pod (the full virt-launcher describe is attached): Warning FailedCreatePodSandBox 4s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_virt-launcher-vma-p2hvq_yoss-ns_fdc21910-7def-499f-b7b5-f47e945ebfac_0(8ac7f5a3d9da6f351c8a48c577dac4b14c78833efdbd20319a7fa4b1551df90f): error adding pod yoss-ns_virt-launcher-vma-p2hvq to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [yoss-ns/virt-launcher-vma-p2hvq/fdc21910-7def-499f-b7b5-f47e945ebfac:br1test]: error adding container to network "br1test": failed to find plugin "cnv-bridge" in path [/opt/multus/bin /var/lib/cni/bin /usr/libexec/cni] 2. The issue occurs when attempting to schedule a Fedora 37 and 38 VMs (I didn't check other OS or versions). 3. The bug was found on an OVN-Kubernets cluster. I didn't check it on OpenshiftSDN. 4. Same scenario with ovs-bridge on the node passed successfully. 5. In CNV 4.14.0 - this scenario was last observed to pass successfully with CNV v4.14.0.rhel9-1328 (it may have passed on later versions as well, but this is the last version where we have actual evidence that it passed). According to VersionExplorer, this bundle includes multus-dynamic-networks-rhel9 v4.14.0-17 cluster-network-addons-operator-rhel9 v4.14.0-17 6. Also attached are - journalctl from the node where the VMI was attempted to be scheduled (one for the Fedora 37 and one for the Fedora 38 case) - virt-launcher pod describe output (one for the Fedora 37 and one for the Fedora 38 case)
I also see this issue, the event in the launcher pod could see error like below: (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_virt-launcher-centos-stream8-following-partridge-bsmxx_default_9c3bc63c-226a-4c27-9106-a40c7c928758_0(6b6bb8c0312d26c49ecff87a5d3baaefae97f4cd9176e7d2b548a8d09cd4fce7): error adding pod default_virt-launcher-centos-stream8-following-partridge-bsmxx to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [default/virt-launcher-centos-stream8-following-partridge-bsmxx/9c3bc63c-226a-4c27-9106-a40c7c928758:ovn-nad]: error adding container to network "ovn-nad": CNI request failed with status 400: '[default/virt-launcher-centos-stream8-following-partridge-bsmxx 6b6bb8c0312d26c49ecff87a5d3baaefae97f4cd9176e7d2b548a8d09cd4fce7 network ovn-nad NAD default/ovn-nad] [default/virt-launcher-centos-stream8-following-partridge-bsmxx 6b6bb8c0312d26c49ecff87a5d3baaefae97f4cd9176e7d2b548a8d09cd4fce7 network ovn-nad NAD default/ovn-nad] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded '
@gouyang that is a different issue. In the case of this BZ, the binary of bridge CNI was not installed and found on the worker. In your case, something wrong is happening on the Kubernetes API level - not in the host filesystem. Have you tried reproducing that issue with Pods instead of VMs?
Verified by following the same scenario as in the BZ description. CNV 4.14.0 (brew.registry.redhat.io/rh-osbs/iib:548986) cluster-network-addons-operator-rhel9:v4.14.0-26
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Virtualization 4.14.0 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:6817