Bug 2228240 - VM with secondary interface can't start
Summary: VM with secondary interface can't start
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Networking
Version: 4.14.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.14.0
Assignee: oshoval
QA Contact: Yossi Segev
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-08-01 19:23 UTC by Yossi Segev
Modified: 2023-11-08 14:06 UTC (History)
3 users (show)

Fixed In Version: cluster-network-addons-operator-rhel9 v4.14.0-26
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-11-08 14:06:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt cluster-network-addons-operator pull 1592 0 None open linux bridge: Fix softlink path 2023-08-02 08:21:17 UTC
Red Hat Issue Tracker CNV-31593 0 None None None 2023-08-01 19:25:58 UTC
Red Hat Product Errata RHSA-2023:6817 0 None None None 2023-11-08 14:06:32 UTC

Description Yossi Segev 2023-08-01 19:23:55 UTC
Description of problem:
A VM with secondary NIC fails to start.


Version-Release number of selected component (if applicable):
CNV: brew.registry.redhat.io/rh-osbs/iib:545376 (v4.14.0.rhel9-1403)
cluster-network-addons-operator-rhel9:v4.14.0-23
multus-dynamic-networks-rhel9:v4.14.0-23


How reproducible:
100% (observed several times, on different clusters)


Steps to Reproduce:
1.
Create a linux-bridge on the cluster workers using a NodeNetworkConfigurationPolicy like the attached bridge-nncp.yaml:
$ oc apply -f bridge-nncp.yaml 
nodenetworkconfigurationpolicy.nmstate.io/linux-bridge-nncp created
$
$ oc get nncp -w
NAME                STATUS   REASON
linux-bridge-nncp            
linux-bridge-nncp   Progressing   ConfigurationProgressing
...
linux-bridge-nncp   Progressing   ConfigurationProgressing
linux-bridge-nncp   Available     SuccessfullyConfigured

2.
Create a NetworkAttachmentDefinition like the attached bridge-nad.yaml
$ oc apply -f bridge-nad.yaml 
networkattachmentdefinition.k8s.cni.cncf.io/br1test-nad created

3. Create and start a VM like the one of the attached vma.yaml
(Please note that I scheduled the VM on a specific node using a nodeSelector, just so I know which journalctl I need to follow for debugging).
$ oc apply -f vma.yaml 
virtualmachine.kubevirt.io/vma created
$
$ virtctl start vma
VM vma was scheduled to start


Actual results:
<BUG>
A VMI is created, but it is stuck on `Scheduling` state.


Expected results:
VMI should complete scheduling and run successfully.


Additional info:
1.
The following failure appears in the virt-launcher pod (the full virt-launcher describe is attached):
  Warning  FailedCreatePodSandBox  4s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_virt-launcher-vma-p2hvq_yoss-ns_fdc21910-7def-499f-b7b5-f47e945ebfac_0(8ac7f5a3d9da6f351c8a48c577dac4b14c78833efdbd20319a7fa4b1551df90f): error adding pod yoss-ns_virt-launcher-vma-p2hvq to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [yoss-ns/virt-launcher-vma-p2hvq/fdc21910-7def-499f-b7b5-f47e945ebfac:br1test]: error adding container to network "br1test": failed to find plugin "cnv-bridge" in path [/opt/multus/bin /var/lib/cni/bin /usr/libexec/cni]

2.
The issue occurs when attempting to schedule a Fedora 37 and 38 VMs (I didn't check other OS or versions).

3.
The bug was found on an OVN-Kubernets cluster. I didn't check it on OpenshiftSDN.

4.
Same scenario with ovs-bridge on the node passed successfully.

5.
In CNV 4.14.0 - this scenario was last observed to pass successfully with CNV v4.14.0.rhel9-1328 (it may have passed on later versions as well, but this is the last version where we have actual evidence that it passed).
According to VersionExplorer, this bundle includes
multus-dynamic-networks-rhel9 v4.14.0-17
cluster-network-addons-operator-rhel9 v4.14.0-17

6.
Also attached are
- journalctl from the node where the VMI was attempted to be scheduled (one for the Fedora 37 and one for the Fedora 38 case)
- virt-launcher pod describe output (one for the Fedora 37 and one for the Fedora 38 case)

Comment 9 Guohua Ouyang 2023-08-01 23:14:08 UTC
I also see this issue, the event in the launcher pod could see error like below:
(combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_virt-launcher-centos-stream8-following-partridge-bsmxx_default_9c3bc63c-226a-4c27-9106-a40c7c928758_0(6b6bb8c0312d26c49ecff87a5d3baaefae97f4cd9176e7d2b548a8d09cd4fce7): error adding pod default_virt-launcher-centos-stream8-following-partridge-bsmxx to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [default/virt-launcher-centos-stream8-following-partridge-bsmxx/9c3bc63c-226a-4c27-9106-a40c7c928758:ovn-nad]: error adding container to network "ovn-nad": CNI request failed with status 400: '[default/virt-launcher-centos-stream8-following-partridge-bsmxx 6b6bb8c0312d26c49ecff87a5d3baaefae97f4cd9176e7d2b548a8d09cd4fce7 network ovn-nad NAD default/ovn-nad] [default/virt-launcher-centos-stream8-following-partridge-bsmxx 6b6bb8c0312d26c49ecff87a5d3baaefae97f4cd9176e7d2b548a8d09cd4fce7 network ovn-nad NAD default/ovn-nad] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded '

Comment 10 Petr Horáček 2023-08-03 07:35:17 UTC
@gouyang that is a different issue. In the case of this BZ, the binary of bridge CNI was not installed and found on the worker. In your case, something wrong is happening on the Kubernetes API level - not in the host filesystem. Have you tried reproducing that issue with Pods instead of VMs?

Comment 11 Yossi Segev 2023-08-03 12:20:16 UTC
Verified by following the same scenario as in the BZ description.
CNV 4.14.0 (brew.registry.redhat.io/rh-osbs/iib:548986)
cluster-network-addons-operator-rhel9:v4.14.0-26

Comment 13 errata-xmlrpc 2023-11-08 14:06:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Virtualization 4.14.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6817


Note You need to log in before you can comment on or make changes to this bug.