Bug 2015481
| Summary: | [4.10] sriov-network-operator daemon pods are failing to start | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Ziv Greenberg <zgreenbe> | |
| Component: | Networking | Assignee: | Peng Liu <pliu> | |
| Networking sub component: | SR-IOV | QA Contact: | Ziv Greenberg <zgreenbe> | |
| Status: | CLOSED ERRATA | Docs Contact: | ||
| Severity: | urgent | |||
| Priority: | urgent | CC: | aasmith, dosmith, emacchi, gcheresh, juriarte, pliu, rlobillo, zshi, zzhao | |
| Version: | 4.10 | Flags: | pliu:
needinfo-
|
|
| Target Milestone: | --- | |||
| Target Release: | 4.10.0 | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2015834 2015835 (view as bug list) | Environment: | ||
| Last Closed: | 2022-03-10 16:20:14 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2015834, 2015835, 2028256 | |||
|
Description
Ziv Greenberg
2021-10-19 10:16:17 UTC
The issue affects both the 4.8 and 4.9 releases. I have verified an upstream patch by @pliu (https://github.com/k8snetworkplumbingwg/sriov-network-operator/pull/191/files) that fixes the issue on the 4.9 release branch. Hi, Ziv Could you help check the fix is works on 4.10 version ? Yes, of course. This is exactly what I'm trying to achieve for a couple of days now. The main problem is that 4.10 currently is not stable from the deployment point of view. I'm trying to find a stable puddle to work with. I'll update as soon as I'll have any progress. Hi, I was able to verify it, please see the details below: [cloud-user@installer-host ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.nightly-2021-11-04-001635 True False 71m Cluster version is 4.10.0-0.nightly-2021-11-04-001635 [cloud-user@installer-host ~]$ [cloud-user@installer-host ~]$ [cloud-user@installer-host ~]$ oc get csv -n openshift-sriov-network-operator NAME DISPLAY VERSION REPLACES PHASE performance-addon-operator.v4.9.0 Performance Addon Operator 4.9.0 Succeeded sriov-network-operator.4.9.0-202110182323 SR-IOV Network Operator 4.9.0-202110182323 Succeeded [cloud-user@installer-host ~]$ [cloud-user@installer-host ~]$ [cloud-user@installer-host ~]$ oc get all -n openshift-sriov-network-operator NAME READY STATUS RESTARTS AGE pod/network-resources-injector-jp2l8 1/1 Running 0 44m pod/network-resources-injector-p7tbw 1/1 Running 0 44m pod/network-resources-injector-v8x6r 1/1 Running 0 44m pod/sriov-device-plugin-knl7c 1/1 Running 0 31m pod/sriov-network-config-daemon-67nhv 3/3 Running 7 (37m ago) 44m pod/sriov-network-config-daemon-p5k2s 3/3 Running 7 (37m ago) 44m pod/sriov-network-operator-976c7d6fc-4gjp8 1/1 Running 2 (32m ago) 44m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/network-resources-injector-service ClusterIP 172.30.223.210 <none> 443/TCP 44m NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/network-resources-injector 3 3 3 3 3 beta.kubernetes.io/os=linux 44m daemonset.apps/sriov-device-plugin 1 1 1 1 1 beta.kubernetes.io/os=linux,node-role.kubernetes.io/worker= 43m daemonset.apps/sriov-network-config-daemon 2 2 2 2 2 beta.kubernetes.io/os=linux,node-role.kubernetes.io/worker= 44m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/sriov-network-operator 1/1 1 1 44m NAME DESIRED CURRENT READY AGE replicaset.apps/sriov-network-operator-976c7d6fc 1 1 1 44m A question please, shouldn't we suppose to have 4.10 version of the sriov network operator instead of 4.9? You shall use a 4.10 image to verify. The fix has not yet been merged in the 4.9 branch. Please try to use image sriov-network-operator.4.10.0-202111031923 or a newer one. Hello Peng, Sorry, I have no experience with it as I've always used the latest marketplace version. Could you please elaborate how I should use/install this specific image? Additionally, if it is not yet merged in to the 4.9 branch, how come it is working on my current environment? Thanks. @zzhao Could you help Ziv to setup the QE operator repo in his environment? *** Bug 2028246 has been marked as a duplicate of this bug. *** Hello, I was able to verify it and also created a dedicated dut pod witch attached SR-IOV VF's: (shiftstack) [cloud-user@installer-host ~]$ oc get clusterversions.config.openshift.io NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.nightly-2021-12-06-162419 True False 76m Cluster version is 4.10.0-0.nightly-2021-12-06-162419 (shiftstack) [cloud-user@installer-host ~]$ (shiftstack) [cloud-user@installer-host ~]$ (shiftstack) [cloud-user@installer-host ~]$ oc get csv -n openshift-sriov-network-operator NAME DISPLAY VERSION REPLACES PHASE performance-addon-operator.v4.9.2 Performance Addon Operator 4.9.2 Succeeded sriov-network-operator.4.10.0-202112070531 SR-IOV Network Operator 4.10.0-202112070531 Succeeded (shiftstack) [cloud-user@installer-host ~]$ (shiftstack) [cloud-user@installer-host ~]$ (shiftstack) [cloud-user@installer-host ~]$ oc get all -n openshift-sriov-network-operator NAME READY STATUS RESTARTS AGE pod/network-resources-injector-bbct4 1/1 Running 0 32m pod/network-resources-injector-m9n8b 1/1 Running 0 32m pod/network-resources-injector-z2nzp 1/1 Running 0 32m pod/sriov-device-plugin-tz7sr 1/1 Running 0 2m54s pod/sriov-network-config-daemon-lllf4 3/3 Running 3 32m pod/sriov-network-config-daemon-ngdrq 3/3 Running 3 32m pod/sriov-network-operator-dfdf7b466-dgw6t 1/1 Running 0 32m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/network-resources-injector-service ClusterIP 172.30.171.60 <none> 443/TCP 32m NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/network-resources-injector 3 3 3 3 3 beta.kubernetes.io/os=linux 32m daemonset.apps/sriov-device-plugin 1 1 1 1 1 beta.kubernetes.io/os=linux,node-role.kubernetes.io/worker= 3m29s daemonset.apps/sriov-network-config-daemon 2 2 2 2 2 beta.kubernetes.io/os=linux,node-role.kubernetes.io/worker= 32m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/sriov-network-operator 1/1 1 1 32m NAME DESIRED CURRENT READY AGE replicaset.apps/sriov-network-operator-dfdf7b466 1 1 1 32m (shiftstack) [cloud-user@installer-host ~]$ (shiftstack) [cloud-user@installer-host ~]$ (shiftstack) [cloud-user@installer-host ~]$ oc get pods NAME READY STATUS RESTARTS AGE dpdk-testpmd 1/1 Running 0 2m19s (shiftstack) [cloud-user@installer-host ~]$ Thanks, Ziv Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |