Bug 1969869

Summary:	virt-launcher pod stuck in Init:0/1 status: error adding container to network "bigip1ens4f0vf2": SRIOV-CNI failed to load netconf: LoadConf(): VF pci addr is required
Product:	Container Native Virtualization (CNV)	Reporter:	Marius Cornea <mcornea>
Component:	Networking	Assignee:	Petr Horáček <phoracek>
Status:	CLOSED DUPLICATE	QA Contact:	Meni Yakove <myakove>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	2.6.5	CC:	cnv-qe-bugs
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-06-16 12:24:13 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Marius Cornea 2021-06-09 11:26:26 UTC

Description of problem:

Following an 4.6.17 -> 4.6.25 -> 4.7.11 OCP upgrade and CNV 2.5.7 -> 2.6.5 upgrade one of the virt-launcher pods remains in Init:0/1 status with describe logs showing "error adding container to network "bigip1ens4f0vf2": SRIOV-CNI failed to load netconf: LoadConf(): VF pci addr is required"


[kni@ocp-edge18 ~]$ oc -n f5-lb get pods
NAME                         READY   STATUS     RESTARTS   AGE
virt-launcher-bigip0-5g7c7   1/1     Running    0          38m
virt-launcher-bigip1-gfx9q   0/1     Init:0/1   0          68m
[kni@ocp-edge18 ~]$ oc -n f5-lb describe pods virt-launcher-bigip1-gfx9q | grep -i fail
  Warning  FailedScheduling        69m                   default-scheduler  0/7 nodes are available: 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) were unschedulable, 3 node(s) didn't match Pod's node affinity.
  Warning  FailedScheduling        69m                   default-scheduler  0/7 nodes are available: 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) were unschedulable, 3 node(s) didn't match Pod's node affinity.
  Warning  FailedScheduling        68m                   default-scheduler  0/7 nodes are available: 1 node(s) were unschedulable, 3 node(s) didn't match Pod's node affinity, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
  Warning  FailedScheduling        62m                   default-scheduler  0/7 nodes are available: 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 4 node(s) didn't match Pod's node affinity.
  Warning  FailedCreatePodSandBox  61m                   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_virt-launcher-bigip1-gfx9q_f5-lb_28cb4897-3d43-4811-8787-c8609e0d5261_0(9b028b2436d825d3217322a0f93a94fc8e7d1b6af52af434b32edcaf2bd2b5be): [f5-lb/virt-launcher-bigip1-gfx9q:bigip1ens4f0vf2]: error adding container to network "bigip1ens4f0vf2": SRIOV-CNI failed to load netconf: LoadConf(): VF pci addr is required


Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. Deploy OCP 4.6.17 with CNV and SR-IOV operators

2. Create attached sriovnetworknodepolicy, sriovnetwork, NodeNetworkConfigurationPolicy and VirtualMachine

3. Upgrade OCP to 4.6.25 and then to 4.7.11

4. Upgrade CNV to 2.6.5

5. Upgrade SR-IOV network operator from 4.6.0-202106010807.p0.git.78e7139 to 4.7.0-202105211528.p0


Actual results:
virt-launcher pods assigned to 1 of the 2 VirtualMachines is in Init:0/1 status

Expected results:
all virt-launcher pods are in Running status as they were before the upgrade procedure

Additional info:

After deleting the virt-launcher pod stuck in Init:0/1 status it gets recreated in Running state.

Attaching must-gather and manifests used to create the VMs and networks.

Comment 1 Petr Horáček 2021-06-16 12:24:13 UTC


*** This bug has been marked as a duplicate of bug 1969870 ***