Description of problem: On the latest 4.8 nightly cluster on Power(ppc64le), upon installing the Elasticsearch operator from OperatorHub UI on the Openshift console , the Subscription gets created but the csv and elasticsearch operator pods never got created. Index image used for CatalogSource: registry-proxy.engineering.redhat.com/rh-osbs/iib:63233 Elasticsearch Version: 5.0.2-8 # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-ppc64le-2021-03-31-044112 True False 24h Cluster version is 4.8.0-0.nightly-ppc64le-2021-03-31-044112 # oc describe Subscription elasticsearch-operator -n openshift-operators-redhat Name: elasticsearch-operator Namespace: openshift-operators-redhat Labels: operators.coreos.com/elasticsearch-operator.openshift-operators-redhat= Annotations: <none> API Version: operators.coreos.com/v1alpha1 Kind: Subscription Metadata: Creation Timestamp: 2021-04-01T10:32:04Z Generation: 1 Managed Fields: API Version: operators.coreos.com/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:spec: .: f:channel: f:installPlanApproval: f:name: f:source: f:sourceNamespace: f:startingCSV: Manager: Mozilla Operation: Update Time: 2021-04-01T10:32:04Z API Version: operators.coreos.com/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:status: .: f:catalogHealth: f:conditions: f:lastUpdated: Manager: catalog Operation: Update Time: 2021-04-01T10:32:04Z API Version: operators.coreos.com/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:labels: .: f:operators.coreos.com/elasticsearch-operator.openshift-operators-redhat: Manager: olm Operation: Update Time: 2021-04-01T10:32:04Z Resource Version: 480634 UID: 21e11e72-3c27-444c-a5e3-70a787831ff5 Spec: Channel: 5.0 Install Plan Approval: Automatic Name: elasticsearch-operator Source: redhat-operators-logging Source Namespace: openshift-marketplace Starting CSV: elasticsearch-operator.5.0.2-8 Status: Catalog Health: Catalog Source Ref: API Version: operators.coreos.com/v1alpha1 Kind: CatalogSource Name: redhat-operators-logging Namespace: openshift-marketplace Resource Version: 479878 UID: dca6c76c-1344-40ea-83d7-337c7bfc8f9b Healthy: true Last Updated: 2021-04-01T10:32:05Z Conditions: Last Transition Time: 2021-04-01T10:32:05Z Message: all available catalogsources are healthy Reason: AllCatalogSourcesHealthy Status: False Type: CatalogSourcesUnhealthy Last Updated: 2021-04-01T10:32:05Z Events: <none> How reproducible: Have tried this on multiple clusters but got the same results. Actual results: Once, the subscription is created the csv is not generated and no operator pods are seen. # oc get csv --all-namespaces | grep elastic NAMESPACE NAME DISPLAY VERSION REPLACES PHASE # # oc get pods -A | grep elastic # Expected results: the csv should show the phase as Succeeded and operator pod must be Running. Additional info: Cluster is healthy with 64GB memory on the worker nodes. # oc get nodes NAME STATUS ROLES AGE VERSION master-0 Ready master 26h v1.20.0+29a606d master-1 Ready master 26h v1.20.0+29a606d master-2 Ready master 26h v1.20.0+29a606d worker-0 Ready worker 26h v1.20.0+29a606d worker-1 Ready worker 25h v1.20.0+29a606d # oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.8.0-0.nightly-ppc64le-2021-03-31-044112 True False False 24h baremetal 4.8.0-0.nightly-ppc64le-2021-03-31-044112 True False False 26h cloud-credential 4.8.0-0.nightly-ppc64le-2021-03-31-044112 True False False 26h cluster-autoscaler 4.8.0-0.nightly-ppc64le-2021-03-31-044112 True False False 26h config-operator 4.8.0-0.nightly-ppc64le-2021-03-31-044112 True False False 26h console 4.8.0-0.nightly-ppc64le-2021-03-31-044112 True False False 24h csi-snapshot-controller 4.8.0-0.nightly-ppc64le-2021-03-31-044112 True False False 24h dns 4.8.0-0.nightly-ppc64le-2021-03-31-044112 True False False 26h etcd 4.8.0-0.nightly-ppc64le-2021-03-31-044112 True False False 26h image-registry 4.8.0-0.nightly-ppc64le-2021-03-31-044112 True False False 24h ingress 4.8.0-0.nightly-ppc64le-2021-03-31-044112 True False False 25h insights 4.8.0-0.nightly-ppc64le-2021-03-31-044112 True False False 26h kube-apiserver 4.8.0-0.nightly-ppc64le-2021-03-31-044112 True False False 25h kube-controller-manager 4.8.0-0.nightly-ppc64le-2021-03-31-044112 True False False 26h kube-scheduler 4.8.0-0.nightly-ppc64le-2021-03-31-044112 True False False 26h kube-storage-version-migrator 4.8.0-0.nightly-ppc64le-2021-03-31-044112 True False False 24h machine-api 4.8.0-0.nightly-ppc64le-2021-03-31-044112 True False False 26h machine-approver 4.8.0-0.nightly-ppc64le-2021-03-31-044112 True False False 26h machine-config 4.8.0-0.nightly-ppc64le-2021-03-31-044112 True False False 25h marketplace 4.8.0-0.nightly-ppc64le-2021-03-31-044112 True False False 24h monitoring 4.8.0-0.nightly-ppc64le-2021-03-31-044112 True False False 71m network 4.8.0-0.nightly-ppc64le-2021-03-31-044112 True False False 26h node-tuning 4.8.0-0.nightly-ppc64le-2021-03-31-044112 True False False 26h openshift-apiserver 4.8.0-0.nightly-ppc64le-2021-03-31-044112 True False False 24h openshift-controller-manager 4.8.0-0.nightly-ppc64le-2021-03-31-044112 True False False 26h openshift-samples 4.8.0-0.nightly-ppc64le-2021-03-31-044112 True False False 26h operator-lifecycle-manager 4.8.0-0.nightly-ppc64le-2021-03-31-044112 True False False 26h operator-lifecycle-manager-catalog 4.8.0-0.nightly-ppc64le-2021-03-31-044112 True False False 26h operator-lifecycle-manager-packageserver 4.8.0-0.nightly-ppc64le-2021-03-31-044112 True False False 24h service-ca 4.8.0-0.nightly-ppc64le-2021-03-31-044112 True False False 26h storage 4.8.0-0.nightly-ppc64le-2021-03-31-044112 True False False 26h
Based recent CI results for P/Z and confirmation that this isn't specific to elasticsearch operators, I'm moving this over to the OLM team for investigation. NFD is Broken Too https://coreos.slack.com/archives/CFFJUNP6C/p1617287692142800?thread_ts=1617201773.126200&cid=CFFJUNP6C CI results for OLM timeouts for P/Z https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-remote-libvirt-s390x-4.8/1377591575755886592 (z) https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-remote-libvirt-ppc64le-4.8/1377591575718137856 (p) This behavior started for P&Z in the nightly builds starting at 3/30/21 UTC 00:00:00 and has been failing consistently since; suggesting that the disruptive patch landed on 3/29.
Based on the commit timeline, this could be related to the golang bump. https://github.com/openshift/operator-framework-olm/commits/master
After chatting with the Power testing team in Multi-Arch, I would like to propose escalating this bug to a "Blocker+", as it is currently blocking the Power regression testing on Elasticsearch and other operators. Therefore, I'm setting the severity to "High" and "Blocker+" flag. After team evaluation, please feel free to re-set the severity and Blocker flag status as needed.
We (OpenShift Virtualization) are experiencing the same symptoms in our upstream openshift-ci tests as well. CSV is not being generated although valid CatalogSource, OperatorGroup and Subscription are present. Example: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/kubevirt_hyperconverged-cluster-operator/1226/pull-ci-kubevirt-hyperconverged-cluster-operator-master-okd-hco-e2e-upgrade-aws/1379003601514401792#1:build-log.txt%3A187 logs from the catalog-operator pod: E0405 10:24:50.380819 1 queueinformer_operator.go:290] sync "kubevirt-hyperconverged" failed: constraints not satisfiable: no operators found matching the criteria of subscription hco-subscription-example, subscription hco-subscription-example exists I0405 10:24:50.380930 1 event.go:282] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"", Name:"kubevirt-hyperconverged", UID:"bd945845-94c5-4bec-8537-6462ae289c1b", APIVersion:"v1", ResourceVersion:"27794", FieldPath:""}): type: 'Warning' reason: 'ResolutionFailed' constraints not satisfiable: no operators found matching the criteria of subscription hco-subscription-example, subscription hco-subscription-example exists
Proposing Target Release 4.8 and Blocker+. After the devel team evaluate these may be subject to change.
I see three entries in registry-proxy.engineering.redhat.com/rh-osbs/iib:63233: {"packageName":"elasticsearch-operator","channelName":"stable","csvName":"elasticsearch-operator.5.0.2-8"} {"packageName":"elasticsearch-operator","channelName":"4.7","csvName":"elasticsearch-operator.4.7.0-202012212130.p0"} {"packageName":"elasticsearch-operator","channelName":"4.6","csvName":"elasticsearch-operator.4.6.0-202103010126.p0"} The Subscription spec references a channel named "5.0", but there don't appear to be any entries with that channel name in that index image. Is that a mistake in the Subscription spec, or is there something wrong with the index?
I had used the 5.0 channel during the installation of Elasticsearch. Had used the same channel in earlier builds of 4.8 in March for which csv had got generated.
Making Ben's Comment 9 un-private as Pravin is a partner engineer and cannot see private comments.
Where did this internal index image come from? http://registry-proxy.engineering.redhat.com/rh-osbs/iib:63233 It seems like that image just isn't built with the right set of metadata for 5.0. It does look like the 4.8 index image is pullable (though not officially released yet) and does contain a 5.0 channel: registry.redhat.io/redhat/redhat-operator-index:v4.8
Created attachment 1769845 [details] NFD operator installation
I have tried installing NFD and Elasticsearch operators on the latest build today on ppc64le. Still observe the same issues with installation of the operators. Does this mean that the fix is not yet included in the nightlies yet? # oc version Client Version: 4.8.0-0.nightly-ppc64le-2021-04-06-162849 Server Version: 4.8.0-0.nightly-ppc64le-2021-04-06-162849 Kubernetes Version: v1.20.0+5f82cdb The e2e test case still fails as well: https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-remote-libvirt-ppc64le-4.8/1379584963803877376 For Elasticsearch, the index image is used from the comet logging notifications: registry-proxy.engineering.redhat.com/rh-osbs/iib:63233 Brew Build Info https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=1558208 csv is not generated: # oc get csv -A NAMESPACE NAME DISPLAY VERSION REPLACES PHASE openshift-operator-lifecycle-manager packageserver Package Server 0.17.0 Succeeded # oc describe subscription elasticsearch-operator -n openshift-operators-redhat Name: elasticsearch-operator Namespace: openshift-operators-redhat Labels: operators.coreos.com/elasticsearch-operator.openshift-operators-redhat= Annotations: <none> API Version: operators.coreos.com/v1alpha1 Kind: Subscription Metadata: Creation Timestamp: 2021-04-07T10:58:08Z Generation: 1 Managed Fields: API Version: operators.coreos.com/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:spec: .: f:channel: f:installPlanApproval: f:name: f:source: f:sourceNamespace: f:startingCSV: Manager: Mozilla Operation: Update Time: 2021-04-07T10:58:08Z API Version: operators.coreos.com/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:status: .: f:catalogHealth: f:conditions: f:lastUpdated: Manager: catalog Operation: Update Time: 2021-04-07T10:58:08Z API Version: operators.coreos.com/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:labels: .: f:operators.coreos.com/elasticsearch-operator.openshift-operators-redhat: Manager: olm Operation: Update Time: 2021-04-07T10:58:08Z Resource Version: 116495 UID: 7c7866dc-63a6-4b72-a42f-4f3099e5f886 Spec: Channel: 5.0 Install Plan Approval: Automatic Name: elasticsearch-operator Source: redhat-operators-logging Source Namespace: openshift-marketplace Starting CSV: elasticsearch-operator.5.0.2-8 Status: Catalog Health: Catalog Source Ref: API Version: operators.coreos.com/v1alpha1 Kind: CatalogSource Name: operator-catalog-48 Namespace: openshift-marketplace Resource Version: 90493 UID: f3b1cea0-3043-4a11-8a64-845cbbba964f Healthy: true Last Updated: 2021-04-07T10:58:08Z Catalog Source Ref: API Version: operators.coreos.com/v1alpha1 Kind: CatalogSource Name: redhat-operators-logging Namespace: openshift-marketplace Resource Version: 90494 UID: 11cfde33-94f6-4a11-91b1-65f96aa0c337 Healthy: true Last Updated: 2021-04-07T10:58:08Z Conditions: Last Transition Time: 2021-04-07T10:58:08Z Message: all available catalogsources are healthy Reason: AllCatalogSourcesHealthy Status: False Type: CatalogSourcesUnhealthy Last Updated: 2021-04-07T10:58:08Z Events: <none> Similar issue with NFD, used the following catalogsource index image: quay.io/openshift-release-dev/ocp-release-nightly:iib-int-index-art-operators-4.8 Have used the 4.8 update channel to install the operator: attachment 1769845 [details] ~]# oc describe subscription nfd -n openshift-operators Name: nfd Namespace: openshift-operators Labels: operators.coreos.com/nfd.openshift-operators= Annotations: <none> API Version: operators.coreos.com/v1alpha1 Kind: Subscription Metadata: Creation Timestamp: 2021-04-07T10:11:18Z Generation: 1 Managed Fields: API Version: operators.coreos.com/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:spec: .: f:channel: f:installPlanApproval: f:name: f:source: f:sourceNamespace: f:startingCSV: Manager: Mozilla Operation: Update Time: 2021-04-07T10:11:18Z API Version: operators.coreos.com/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:status: .: f:catalogHealth: f:conditions: f:currentCSV: f:installPlanGeneration: f:installPlanRef: .: f:apiVersion: f:kind: f:name: f:namespace: f:resourceVersion: f:uid: f:installplan: .: f:apiVersion: f:kind: f:name: f:uuid: f:lastUpdated: f:state: Manager: catalog Operation: Update Time: 2021-04-07T10:11:18Z API Version: operators.coreos.com/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:labels: .: f:operators.coreos.com/nfd.openshift-operators: Manager: olm Operation: Update Time: 2021-04-07T10:11:18Z Resource Version: 102718 UID: c8ab48e4-50bd-4708-9db3-ee470bff9994 Spec: Channel: 4.8 Install Plan Approval: Automatic Name: nfd Source: operator-catalog-48 Source Namespace: openshift-marketplace Starting CSV: nfd.4.8.0-202102270107.p0 Status: Catalog Health: Catalog Source Ref: API Version: operators.coreos.com/v1alpha1 Kind: CatalogSource Name: operator-catalog-48 Namespace: openshift-marketplace Resource Version: 90493 UID: f3b1cea0-3043-4a11-8a64-845cbbba964f Healthy: true Last Updated: 2021-04-07T10:11:18Z Catalog Source Ref: API Version: operators.coreos.com/v1alpha1 Kind: CatalogSource Name: redhat-operators-logging Namespace: openshift-marketplace Resource Version: 90494 UID: 11cfde33-94f6-4a11-91b1-65f96aa0c337 Healthy: true Last Updated: 2021-04-07T10:11:18Z Conditions: Last Transition Time: 2021-04-07T10:11:18Z Message: all available catalogsources are healthy Reason: AllCatalogSourcesHealthy Status: False Type: CatalogSourcesUnhealthy Last Transition Time: 2021-04-07T10:11:18Z Reason: Installing Status: True Type: InstallPlanPending Current CSV: nfd.4.8.0-202102270107.p0 Install Plan Generation: 1 Install Plan Ref: API Version: operators.coreos.com/v1alpha1 Kind: InstallPlan Name: install-s992z Namespace: openshift-operators Resource Version: 102712 UID: 2b063b15-5cdb-46be-8d2e-0b5cdc359a5c Installplan: API Version: operators.coreos.com/v1alpha1 Kind: InstallPlan Name: install-s992z Uuid: 2b063b15-5cdb-46be-8d2e-0b5cdc359a5c Last Updated: 2021-04-07T10:11:18Z State: UpgradePending Events: <none> # oc get csv -A NAMESPACE NAME DISPLAY VERSION REPLACES PHASE openshift-operator-lifecycle-manager packageserver Package Server 0.17.0 Succeeded
Here is some more detail connecting this to CI results. Failing test: [sig-operator] an end user can use OLM can subscribe to the operator [Suite:openshift/conformance/parallel] https://sippy.ci.openshift.org/testdetails?release=4.8&test=%5Bsig-operator%5D+an+end+user+can+use+OLM+can+subscribe+to+the+operator+%5BSuite%3Aopenshift%2Fconformance%2Fparallel%5D
The issue seems to be that this uses bundle images which aren't built for non-amd64 architectures: Describing the pod that fails: Normal Pulled 76s kubelet Successfully pulled image "registry.redhat.io/openshift4/ose-local-storage-operator-bundle@sha256:e2422f742c78657d0de42a92b9ff438836155243b800e631eb1ba9af5fda3cc1" in 1.174664188s Warning BackOff 74s (x2 over 75s) kubelet Back-off restarting failed container Similarly for the logging, and NFD operators the bundle image that is linked is not built Multi-Arch, so it will CrashLoopBackOff - so I assume this is across the board.
After working through the issue with Ben & Kevin on the OLM team, the bundles are built no-arch so they should work fine. The fix itself should be the fix, but isn't included in the latest nightlies - so I'll be following up on that now.
The fix not being included in latest builds is expected for now, due to some issues ART are experiencing, I'll monitor this and test again once the fix is included in the latest builds.
The notifications for Elasticsearch builds show the newer builds in Aborted state. I'm monitoring them and will verify this once I see them passing.
Verified with 5.0.2-18 version of Elasticsearch. Installation successful. oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-ppc64le-2021-04-08-063005 True False 5d1h Cluster version is 4.8.0-0.nightly-ppc64le-2021-04-08-063005 # oc get csv -n openshift-logging NAME DISPLAY VERSION REPLACES PHASE cluster-logging.5.0.2-18 Red Hat OpenShift Logging 5.0.2-18 Succeeded elasticsearch-operator.5.0.2-18 OpenShift Elasticsearch Operator 5.0.2-18 Succeeded # oc get pods -n openshift-logging NAME READY STATUS RESTARTS AGE cluster-logging-operator-748bcb587b-4pnrx 1/1 Running 0 19h curator-26971410-k99tb 0/1 Completed 0 7h38m elasticsearch-cdm-qih5bz15-1-668949dbd-2t86v 2/2 Running 0 19h elasticsearch-cdm-qih5bz15-2-7d89fdf6d7-jg4kk 2/2 Running 0 19h elasticsearch-cdm-qih5bz15-3-5cc4d4878b-9c6ml 2/2 Running 0 19h elasticsearch-im-app-26971860-gsdg5 0/1 Completed 0 8m26s elasticsearch-im-audit-26971860-9wjkw 0/1 Completed 0 8m26s elasticsearch-im-infra-26971860-hpxzw 0/1 Completed 0 8m26s fluentd-fwhbz 1/1 Running 0 19h fluentd-qjvmr 1/1 Running 0 19h fluentd-twvq8 1/1 Running 0 19h fluentd-x4b48 1/1 Running 0 19h fluentd-xjdq8 1/1 Running 0 19h fluentd-zbqhq 1/1 Running 0 19h kibana-6984cc4797-nh2ln 2/2 Running 0 19h
Hi Pravin, Thanks for your verification! Change the status to VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438