Created attachment 1781294 [details] sriov-network-operator.log Description of problem: OLM is failing due to SR-IOV webhook error: failed calling webhook "network-resources-injector-config.k8s.io": Post "https://network-resources-injector-service.openshift-sriov-network-operator.svc:443/mutate?timeout=10s": no endpoints available for service "network-resources-injector-service" Version-Release number of selected component (if applicable): registry-proxy.engineering.redhat.com/rh-osbs/openshift-ose-sriov-network-operator-bundle:v4.8.0-202105080740.p0-1 How reproducible: Deploy latest 4.8 sriov operator from brew. Steps to Reproduce: Deploy latest 4.8 sriov operator from brew. Actual results: OLM is failing due to sriov webhook. openshift-operator-lifecycle-manager packageserver Package Server 0.17.0 Failed # oc describe replicaset -n openshift-operator-lifecycle-manager packageserver-5f5c58776d ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedCreate 3m13s (x92 over 7h39m) replicaset-controller Error creating: Internal error occurred: failed calling webhook "network-resources-injector-config.k8s.io": Post "https://network-resources-injector-service.openshift-sriov-network-operator.svc:443/mutate?timeout=10s": no endpoints available for service "network-resources-injector-service" Expected results: OLM should run and sriov deployment should succeed. Additional info: This is blocking our CI as we are unable to deploy PTP and SRIOV operators.
we also know that the previous build (v4.8.0.202105042126.p0-1) did not exhibit this error
The problem was the upgrade of the admission webhook to version v1 I open PR to fix it. https://github.com/k8snetworkplumbingwg/sriov-network-operator/pull/131
@Sebastian Scheinkman seems I did not meet this issue when I setup the version 4.8.0-202105080740.p0 # oc get csv NAME DISPLAY VERSION REPLACES PHASE sriov-network-operator.4.8.0-202105080740.p0 SR-IOV Network Operator 4.8.0-202105080740.p0 Succeeded # oc get pod NAME READY STATUS RESTARTS AGE network-resources-injector-8mksv 1/1 Running 0 4m28s network-resources-injector-949kk 1/1 Running 0 4m43s network-resources-injector-glskv 1/1 Running 0 5m1s operator-webhook-5kglx 1/1 Running 0 4m43s operator-webhook-qssgz 1/1 Running 0 4m28s operator-webhook-x9k8b 1/1 Running 0 5m1s sriov-cni-j6ntc 2/2 Running 0 4m40s sriov-device-plugin-g7hfr 1/1 Running 0 4m19s sriov-network-config-daemon-wb424 1/1 Running 0 5m4s sriov-network-config-daemon-xk48z 1/1 Running 0 4m44s sriov-network-operator-cd85d8457-5swhp 1/1 Running 0 5m34s
also have a try with upgrade from sriov-network-operator.4.8.0-202105042126.p0 to 4.8.0-202105080740.p0 # oc get csv -n openshift-sriov-network-operator NAME DISPLAY VERSION REPLACES PHASE sriov-network-operator.4.8.0-202105080740.p0 SR-IOV Network Operator 4.8.0-202105080740.p0 sriov-network-operator.4.8.0-202105042126.p0 Succeeded
Hi, Sabina Could you help verified this bug? or if you can provide the steps to reproduce this issue since it's not happen in our QE side, thanks
(In reply to zhaozhanqi from comment #7) > Hi, Sabina > > Could you help verified this bug? or if you can provide the steps to > reproduce this issue since it's not happen in our QE side, thanks Hi, @Sebastian Scheinkman's PR fixed the issue. All the pods are up and running. Tested image: registry-proxy.engineering.redhat.com/rh-osbs/openshift-ose-sriov-network-operator-bundle:v4.8.0.202105111002.p0-1
Thanks, Sabina then move this bug to verified according to 8
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438