Bug 2028256

Summary: [4.9] sriov-network-operator daemon pods are failing to start
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: NetworkingAssignee: Peng Liu <pliu>
Networking sub component: SR-IOV QA Contact: Ziv Greenberg <zgreenbe>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: aasmith, dosmith, emacchi, gcheresh, juriarte, pliu, rlobillo, zshi, zzhao
Version: 4.8   
Target Milestone: ---   
Target Release: 4.9.z   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-01-04 18:41:24 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2015481    
Bug Blocks: 2015835, 2028257    

Comment 3 zenghui.shi 2021-12-13 14:33:35 UTC
*** Bug 2015834 has been marked as a duplicate of this bug. ***

Comment 4 Ziv Greenberg 2021-12-20 09:22:04 UTC
Hello,

I was able to verify it and also created a dedicated dut pod with attached SR-IOV VF's:

[cloud-user@installer-host ~]$ oc get clusterversions.config.openshift.io
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-12-15-103522   True        False         14h     Cluster version is 4.9.0-0.nightly-2021-12-15-103522
[cloud-user@installer-host ~]$
[cloud-user@installer-host ~]$
[cloud-user@installer-host ~]$
[cloud-user@installer-host ~]$ oc get csv -n openshift-sriov-network-operator
NAME                                        DISPLAY                      VERSION              REPLACES   PHASE
performance-addon-operator.v4.9.4           Performance Addon Operator   4.9.4                           Succeeded
sriov-network-operator.4.9.0-202112142229   SR-IOV Network Operator      4.9.0-202112142229              Succeeded
[cloud-user@installer-host ~]$
[cloud-user@installer-host ~]$
[cloud-user@installer-host ~]$
[cloud-user@installer-host ~]$ oc get all -n openshift-sriov-network-operator
NAME                                          READY   STATUS    RESTARTS   AGE
pod/network-resources-injector-cnzgf          1/1     Running   0          16m
pod/network-resources-injector-sckph          1/1     Running   0          16m
pod/network-resources-injector-tv5mk          1/1     Running   0          16m
pod/sriov-device-plugin-t9jrx                 1/1     Running   0          4m38s
pod/sriov-network-config-daemon-k7wsc         3/3     Running   0          16m
pod/sriov-network-config-daemon-sxtgp         3/3     Running   0          16m
pod/sriov-network-operator-7856c958bc-b4ws8   1/1     Running   0          17m

NAME                                         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
service/network-resources-injector-service   ClusterIP   172.30.111.12   <none>        443/TCP   16m

NAME                                         DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                                                 AGE
daemonset.apps/network-resources-injector    3         3         3       3            3           beta.kubernetes.io/os=linux                                   16m
daemonset.apps/sriov-device-plugin           1         1         1       1            1           beta.kubernetes.io/os=linux,node-role.kubernetes.io/worker=   6m31s
daemonset.apps/sriov-network-config-daemon   2         2         2       2            2           beta.kubernetes.io/os=linux,node-role.kubernetes.io/worker=   16m

NAME                                     READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/sriov-network-operator   1/1     1            1           17m

NAME                                                DESIRED   CURRENT   READY   AGE
replicaset.apps/sriov-network-operator-7856c958bc   1         1         1       17m
[cloud-user@installer-host ~]$
[cloud-user@installer-host ~]$
[cloud-user@installer-host ~]$
[cloud-user@installer-host ~]$ oc get pods
NAME           READY   STATUS    RESTARTS   AGE
dpdk-testpmd   1/1     Running   0          3m43s
[cloud-user@installer-host ~]$
[cloud-user@installer-host ~]$
[cloud-user@installer-host ~]$
[cloud-user@installer-host ~]$ oc logs dpdk-testpmd | grep 'Virtual Function'
0000:00:05.0 'Ethernet Virtual Function 700 Series 154c' drv=vfio-pci unused=
0000:00:06.0 'Ethernet Virtual Function 700 Series 154c' drv=vfio-pci unused=
0000:00:05.0 'Ethernet Virtual Function 700 Series 154c' drv=vfio-pci unused=
0000:00:06.0 'Ethernet Virtual Function 700 Series 154c' drv=vfio-pci unused=

Thanks,
Ziv

Comment 7 errata-xmlrpc 2022-01-04 18:41:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.12 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:5214