Bug 1969869

Summary: virt-launcher pod stuck in Init:0/1 status: error adding container to network "bigip1ens4f0vf2": SRIOV-CNI failed to load netconf: LoadConf(): VF pci addr is required
Product: Container Native Virtualization (CNV) Reporter: Marius Cornea <mcornea>
Component: NetworkingAssignee: Petr Horáček <phoracek>
Status: CLOSED DUPLICATE QA Contact: Meni Yakove <myakove>
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.6.5CC: cnv-qe-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-06-16 12:24:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marius Cornea 2021-06-09 11:26:26 UTC
Description of problem:

Following an 4.6.17 -> 4.6.25 -> 4.7.11 OCP upgrade and CNV 2.5.7 -> 2.6.5 upgrade one of the virt-launcher pods remains in Init:0/1 status with describe logs showing "error adding container to network "bigip1ens4f0vf2": SRIOV-CNI failed to load netconf: LoadConf(): VF pci addr is required"


[kni@ocp-edge18 ~]$ oc -n f5-lb get pods
NAME                         READY   STATUS     RESTARTS   AGE
virt-launcher-bigip0-5g7c7   1/1     Running    0          38m
virt-launcher-bigip1-gfx9q   0/1     Init:0/1   0          68m
[kni@ocp-edge18 ~]$ oc -n f5-lb describe pods virt-launcher-bigip1-gfx9q | grep -i fail
  Warning  FailedScheduling        69m                   default-scheduler  0/7 nodes are available: 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) were unschedulable, 3 node(s) didn't match Pod's node affinity.
  Warning  FailedScheduling        69m                   default-scheduler  0/7 nodes are available: 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) were unschedulable, 3 node(s) didn't match Pod's node affinity.
  Warning  FailedScheduling        68m                   default-scheduler  0/7 nodes are available: 1 node(s) were unschedulable, 3 node(s) didn't match Pod's node affinity, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
  Warning  FailedScheduling        62m                   default-scheduler  0/7 nodes are available: 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 4 node(s) didn't match Pod's node affinity.
  Warning  FailedCreatePodSandBox  61m                   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_virt-launcher-bigip1-gfx9q_f5-lb_28cb4897-3d43-4811-8787-c8609e0d5261_0(9b028b2436d825d3217322a0f93a94fc8e7d1b6af52af434b32edcaf2bd2b5be): [f5-lb/virt-launcher-bigip1-gfx9q:bigip1ens4f0vf2]: error adding container to network "bigip1ens4f0vf2": SRIOV-CNI failed to load netconf: LoadConf(): VF pci addr is required


Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. Deploy OCP 4.6.17 with CNV and SR-IOV operators

2. Create attached sriovnetworknodepolicy, sriovnetwork, NodeNetworkConfigurationPolicy and VirtualMachine

3. Upgrade OCP to 4.6.25 and then to 4.7.11

4. Upgrade CNV to 2.6.5

5. Upgrade SR-IOV network operator from 4.6.0-202106010807.p0.git.78e7139 to 4.7.0-202105211528.p0


Actual results:
virt-launcher pods assigned to 1 of the 2 VirtualMachines is in Init:0/1 status

Expected results:
all virt-launcher pods are in Running status as they were before the upgrade procedure

Additional info:

After deleting the virt-launcher pod stuck in Init:0/1 status it gets recreated in Running state.

Attaching must-gather and manifests used to create the VMs and networks.

Comment 1 Petr Horáček 2021-06-16 12:24:13 UTC

*** This bug has been marked as a duplicate of bug 1969870 ***