2228240 – VM with secondary interface can't start

Bug 2228240 - VM with secondary interface can't start

Summary: VM with secondary interface can't start

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.14.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	4.14.0
Assignee:	oshoval
QA Contact:	Yossi Segev
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2023-08-01 19:23 UTC by Yossi Segev
Modified:	2023-11-08 14:06 UTC (History)
CC List:	3 users (show)
Fixed In Version:	cluster-network-addons-operator-rhel9 v4.14.0-26
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-11-08 14:06:16 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	kubevirt cluster-network-addons-operator pull 1592	None	open	linux bridge: Fix softlink path	2023-08-02 08:21:17 UTC
Red Hat Issue Tracker	CNV-31593	None	None	None	2023-08-01 19:25:58 UTC
Red Hat Product Errata	RHSA-2023:6817	None	None	None	2023-11-08 14:06:32 UTC

Description Yossi Segev 2023-08-01 19:23:55 UTC

Description of problem:
A VM with secondary NIC fails to start.

Version-Release number of selected component (if applicable):
CNV: brew.registry.redhat.io/rh-osbs/iib:545376 (v4.14.0.rhel9-1403)
cluster-network-addons-operator-rhel9:v4.14.0-23
multus-dynamic-networks-rhel9:v4.14.0-23

How reproducible:
100% (observed several times, on different clusters)

Steps to Reproduce:
1.
Create a linux-bridge on the cluster workers using a NodeNetworkConfigurationPolicy like the attached bridge-nncp.yaml:
$ oc apply -f bridge-nncp.yaml
nodenetworkconfigurationpolicy.nmstate.io/linux-bridge-nncp created
$
$ oc get nncp -w
NAME STATUS REASON
linux-bridge-nncp
linux-bridge-nncp Progressing ConfigurationProgressing
...
linux-bridge-nncp Progressing ConfigurationProgressing
linux-bridge-nncp Available SuccessfullyConfigured

2.
Create a NetworkAttachmentDefinition like the attached bridge-nad.yaml
$ oc apply -f bridge-nad.yaml
networkattachmentdefinition.k8s.cni.cncf.io/br1test-nad created

3. Create and start a VM like the one of the attached vma.yaml
(Please note that I scheduled the VM on a specific node using a nodeSelector, just so I know which journalctl I need to follow for debugging).
$ oc apply -f vma.yaml
virtualmachine.kubevirt.io/vma created
$
$ virtctl start vma
VM vma was scheduled to start

Actual results:
<BUG>
A VMI is created, but it is stuck on `Scheduling` state.

Expected results:
VMI should complete scheduling and run successfully.

Additional info:
1.
The following failure appears in the virt-launcher pod (the full virt-launcher describe is attached):
Warning FailedCreatePodSandBox 4s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_virt-launcher-vma-p2hvq_yoss-ns_fdc21910-7def-499f-b7b5-f47e945ebfac_0(8ac7f5a3d9da6f351c8a48c577dac4b14c78833efdbd20319a7fa4b1551df90f): error adding pod yoss-ns_virt-launcher-vma-p2hvq to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [yoss-ns/virt-launcher-vma-p2hvq/fdc21910-7def-499f-b7b5-f47e945ebfac:br1test]: error adding container to network "br1test": failed to find plugin "cnv-bridge" in path [/opt/multus/bin /var/lib/cni/bin /usr/libexec/cni]

2.
The issue occurs when attempting to schedule a Fedora 37 and 38 VMs (I didn't check other OS or versions).

3.
The bug was found on an OVN-Kubernets cluster. I didn't check it on OpenshiftSDN.

4.
Same scenario with ovs-bridge on the node passed successfully.

5.
In CNV 4.14.0 - this scenario was last observed to pass successfully with CNV v4.14.0.rhel9-1328 (it may have passed on later versions as well, but this is the last version where we have actual evidence that it passed).
According to VersionExplorer, this bundle includes
multus-dynamic-networks-rhel9 v4.14.0-17
cluster-network-addons-operator-rhel9 v4.14.0-17

6.
Also attached are
- journalctl from the node where the VMI was attempted to be scheduled (one for the Fedora 37 and one for the Fedora 38 case)
- virt-launcher pod describe output (one for the Fedora 37 and one for the Fedora 38 case)

Comment 9 Guohua Ouyang 2023-08-01 23:14:08 UTC

I also see this issue, the event in the launcher pod could see error like below:
(combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_virt-launcher-centos-stream8-following-partridge-bsmxx_default_9c3bc63c-226a-4c27-9106-a40c7c928758_0(6b6bb8c0312d26c49ecff87a5d3baaefae97f4cd9176e7d2b548a8d09cd4fce7): error adding pod default_virt-launcher-centos-stream8-following-partridge-bsmxx to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [default/virt-launcher-centos-stream8-following-partridge-bsmxx/9c3bc63c-226a-4c27-9106-a40c7c928758:ovn-nad]: error adding container to network "ovn-nad": CNI request failed with status 400: '[default/virt-launcher-centos-stream8-following-partridge-bsmxx 6b6bb8c0312d26c49ecff87a5d3baaefae97f4cd9176e7d2b548a8d09cd4fce7 network ovn-nad NAD default/ovn-nad] [default/virt-launcher-centos-stream8-following-partridge-bsmxx 6b6bb8c0312d26c49ecff87a5d3baaefae97f4cd9176e7d2b548a8d09cd4fce7 network ovn-nad NAD default/ovn-nad] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded '

Comment 10 Petr Horáček 2023-08-03 07:35:17 UTC

@gouyang that is a different issue. In the case of this BZ, the binary of bridge CNI was not installed and found on the worker. In your case, something wrong is happening on the Kubernetes API level - not in the host filesystem. Have you tried reproducing that issue with Pods instead of VMs?

Comment 11 Yossi Segev 2023-08-03 12:20:16 UTC

Verified by following the same scenario as in the BZ description.
CNV 4.14.0 (brew.registry.redhat.io/rh-osbs/iib:548986)
cluster-network-addons-operator-rhel9:v4.14.0-26

Comment 13 errata-xmlrpc 2023-11-08 14:06:16 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Virtualization 4.14.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6817

Note You need to log in before you can comment on or make changes to this bug.