Description of problem (please be detailed as possible and provide log snippests): Storage System stuck in Status "Condition: Progressing" while deploying minimal deployment cluster . Version of all relevant components (if applicable): (yulienv) [ypersky@ypersky ocs-ci]$ oc get csv -A NAMESPACE NAME DISPLAY VERSION REPLACES PHASE openshift-operator-lifecycle-manager packageserver Package Server 0.19.0 Succeeded openshift-storage mcg-operator.v4.10.0 NooBaa Operator 4.10.0 Succeeded openshift-storage ocs-operator.v4.10.0 OpenShift Container Storage 4.10.0 Succeeded openshift-storage odf-operator.v4.10.0 OpenShift Data Foundation 4.10.0 Succeeded (yulienv) [ypersky@ypersky ocs-ci]$ ODF build: 4.10.0.161 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes! Deployment is stuck. Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 3 Can this issue reproducible? Yes - saw it number of times on various deployments Can this issue reproduce from the UI? Yes If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Deploy OCP cluster with minimal resources(8 CPU per worker node). 2. Start deploying ODF 4.10.0.160/161 3. Actual results: After creating storage system, the storage system is stuck in Status Condition:Progressing forever. The deployment is stuck with this warning: Warning ReconcileFailed 22m StorageSystem controller InstallPlan not found for CSV odf-csi-addons-operator.v4.10.0 Expected results: Storage System should reach status Ready. Additional info: (yulienv) [ypersky@ypersky ocs-ci]$ oc describe storagesystem -A Name: ocs-storagecluster-storagesystem Namespace: openshift-storage Labels: <none> Annotations: <none> API Version: odf.openshift.io/v1alpha1 Kind: StorageSystem Metadata: Creation Timestamp: 2022-02-21T21:30:03Z Finalizers: storagesystem.odf.openshift.io Generation: 1 Managed Fields: API Version: odf.openshift.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:spec: .: f:kind: f:name: f:namespace: Manager: Mozilla Operation: Update Time: 2022-02-21T21:30:03Z API Version: odf.openshift.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:finalizers: .: v:"storagesystem.odf.openshift.io": Manager: manager Operation: Update Time: 2022-02-21T21:30:03Z API Version: odf.openshift.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:status: .: f:conditions: Manager: manager Operation: Update Subresource: status Time: 2022-02-21T21:30:03Z Resource Version: 49832 UID: 62d24b32-c074-45c9-81f1-91722ca3da02 Spec: Kind: storagecluster.ocs.openshift.io/v1 Name: ocs-storagecluster Namespace: openshift-storage Status: Conditions: Last Heartbeat Time: 2022-02-21T21:51:55Z Last Transition Time: 2022-02-21T21:30:03Z Message: Reconcile is in progress Reason: Reconciling Status: False Type: Available Last Heartbeat Time: 2022-02-21T21:51:55Z Last Transition Time: 2022-02-21T21:30:03Z Message: Reconcile is in progress Reason: Reconciling Status: True Type: Progressing Last Heartbeat Time: 2022-02-21T21:51:55Z Last Transition Time: 2022-02-21T21:30:03Z Message: StorageSystem CR is valid Reason: Valid Status: False Type: StorageSystemInvalid Last Heartbeat Time: 2022-02-21T21:51:55Z Last Transition Time: 2022-02-21T21:30:03Z Message: InstallPlan not found for CSV odf-csi-addons-operator.v4.10.0 Reason: NotReady Status: False Type: VendorCsvReady Last Heartbeat Time: 2022-02-21T21:30:03Z Last Transition Time: 2022-02-21T21:30:03Z Message: Initializing StorageSystem Reason: Init Status: Unknown Type: VendorSystemPresent Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning ReconcileFailed 22m StorageSystem controller InstallPlan not found for CSV odf-csi-addons-operator.v4.10.0 (yulienv) [ypersky@ypersky ocs-ci]$
Please note that must gather logs are available her: rhsqe-repo.lab.eng.blr.redhat.com:/var/www/html/OCS/ocs-qe-bugs/bz-2056697/
Link to UI : https://console-openshift-console.apps.lr5-ypersky-depmi.qe.rh-ocs.com/k8s/ns/openshift-storage/operators.coreos.com~v1alpha1~ClusterServiceVersion/odf-operator.v4.10.0/odf.openshift.io~v1alpha1~StorageSystem Link to cluster details: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/10197/
odf-operator creates this odf-csi-addons-operator subscription and it is not aware of the catalog odf-catalogsource. It is aware of the redhat-operators. So whenever you create a catalogsource create it with redhat-operators name only. It will work without any issues. $ oc get catalogsources.operators.coreos.com -n openshift-marketplace NAME DISPLAY TYPE PUBLISHER AGE certified-operators Certified Operators grpc Red Hat 6h19m community-operators Community Operators grpc Red Hat 6h19m odf-catalogsource OpenShift Data Foundation grpc Red Hat 5h59m redhat-marketplace Red Hat Marketplace grpc Red Hat 6h19m $ oc get sub odf-csi-addons-operator -o yaml apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: odf-csi-addons-operator namespace: openshift-storage spec: channel: stable-4.10 installPlanApproval: Automatic name: odf-csi-addons-operator source: redhat-operators sourceNamespace: openshift-marketplace startingCSV: odf-csi-addons-operator.v4.10.0 status: conditions: - lastTransitionTime: "2022-02-21T21:26:08Z" message: targeted catalogsource openshift-marketplace/redhat-operators missing reason: UnhealthyCatalogSourceFound status: "True" type: CatalogSourcesUnhealthy - message: 'constraints not satisfiable: subscription odf-csi-addons-operator exists, no operators found from catalog redhat-operators in namespace openshift-marketplace referenced by subscription odf-csi-addons-operator' reason: ConstraintsNotSatisfiable status: "True" type: ResolutionFailed
Please try as suggested by Nitin.
before running the oc create -f /<PATH>/catalog-source.yaml we also run this command: oc patch operatorhub.config.openshift.io/cluster -p='{"spec":{"sources":[{"disabled":true,"name":"redhat-operators"}]}}' --type=merge maybe that is what caused the problem?
It should not, As you can see in the above output there was no redhat-operators catalog. This means this command was run on the setup but the redhat-operators catalog was not created by the user as expected.
@Nitin, I've deployed another OCP 4.10 cluster and then created catalog source with this content: ====================================== apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: labels: ocs-operator-internal: 'true' name: odf-catalogsource namespace: openshift-marketplace spec: displayName: OpenShift Data Foundation icon: base64data: PHN2ZyBpZD0iTGF5ZXJfMSIgZGF0YS1uYW1lPSJMYXllciAxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCAxOTIgMTQ1Ij48ZGVmcz48c3R5bGU+LmNscy0xe2ZpbGw6I2UwMDt9PC9zdHlsZT48L2RlZnM+PHRpdGxlPlJlZEhhdC1Mb2dvLUhhdC1Db2xvcjwvdGl0bGU+PHBhdGggZD0iTTE1Ny43Nyw2Mi42MWExNCwxNCwwLDAsMSwuMzEsMy40MmMwLDE0Ljg4LTE4LjEsMTcuNDYtMzAuNjEsMTcuNDZDNzguODMsODMuNDksNDIuNTMsNTMuMjYsNDIuNTMsNDRhNi40Myw2LjQzLDAsMCwxLC4yMi0xLjk0bC0zLjY2LDkuMDZhMTguNDUsMTguNDUsMCwwLDAtMS41MSw3LjMzYzAsMTguMTEsNDEsNDUuNDgsODcuNzQsNDUuNDgsMjAuNjksMCwzNi40My03Ljc2LDM2LjQzLTIxLjc3LDAtMS4wOCwwLTEuOTQtMS43My0xMC4xM1oiLz48cGF0aCBjbGFzcz0iY2xzLTEiIGQ9Ik0xMjcuNDcsODMuNDljMTIuNTEsMCwzMC42MS0yLjU4LDMwLjYxLTE3LjQ2YTE0LDE0LDAsMCwwLS4zMS0zLjQybC03LjQ1LTMyLjM2Yy0xLjcyLTcuMTItMy4yMy0xMC4zNS0xNS43My0xNi42QzEyNC44OSw4LjY5LDEwMy43Ni41LDk3LjUxLjUsOTEuNjkuNSw5MCw4LDgzLjA2LDhjLTYuNjgsMC0xMS42NC01LjYtMTcuODktNS42LTYsMC05LjkxLDQuMDktMTIuOTMsMTIuNSwwLDAtOC40MSwyMy43Mi05LjQ5LDI3LjE2QTYuNDMsNi40MywwLDAsMCw0Mi41Myw0NGMwLDkuMjIsMzYuMywzOS40NSw4NC45NCwzOS40NU0xNjAsNzIuMDdjMS43Myw4LjE5LDEuNzMsOS4wNSwxLjczLDEwLjEzLDAsMTQtMTUuNzQsMjEuNzctMzYuNDMsMjEuNzdDNzguNTQsMTA0LDM3LjU4LDc2LjYsMzcuNTgsNTguNDlhMTguNDUsMTguNDUsMCwwLDEsMS41MS03LjMzQzIyLjI3LDUyLC41LDU1LC41LDc0LjIyYzAsMzEuNDgsNzQuNTksNzAuMjgsMTMzLjY1LDcwLjI4LDQ1LjI4LDAsNTYuNy0yMC40OCw1Ni43LTM2LjY1LDAtMTIuNzItMTEtMjcuMTYtMzAuODMtMzUuNzgiLz48L3N2Zz4= mediatype: image/svg+xml image: quay.io/rhceph-dev/ocs-registry:4.10.0-161 priority: 100 publisher: Red Hat sourceType: grpc updateStrategy: registryPoll: interval: 15m (yulienv) [ypersky@ypersky ocs-ci ====================================== (yulienv) [ypersky@ypersky ocs-ci]$ oc get catalogsources.operators.coreos.com -n openshift-marketplace NAME DISPLAY TYPE PUBLISHER AGE certified-operators Certified Operators grpc Red Hat 11h community-operators Community Operators grpc Red Hat 11h odf-catalogsource OpenShift Data Foundation grpc Red Hat 7m29s redhat-marketplace Red Hat Marketplace grpc Red Hat 11h redhat-operators Red Hat Operators grpc Red Hat 11h (yulienv) [ypersky@ypersky ocs-ci]$ And still StorageSystem is stuck in Progressing: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning ReconcileFailed 5m6s StorageSystem controller CSV is not successfully installed; InstallPlan not found for CSV odf-csi-addons-operator.v4.10.0 Warning ReconcileFailed 4m55s StorageSystem controller InstallPlan not found for CSV odf-csi-addons-operator.v4.10.0 (yulienv) [ypersky@ypersky ocs-ci]$ Looks like the problem is persistent. You are welcome to check this cluster: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/10227/
@Nitin, Will try those exact steps and will let you know.
@Nitin, I've performed the following steps: 1) Deployed another ocp cluster 2) After successtul ocp deployment run this command: oc patch operatorhub.config.openshift.io/cluster -p='{"spec":{"sources":[{"disabled":true,"name":"redhat-operators"}]}}' --type=merge 3) created catalog source with the content in comment#9 Result: Storage System status is StorageSystem However, the following event appears: Warning ReconcileFailed 39s StorageSystem controller StorageCluster.ocs.openshift.io "ocs-storagecluster" not found The cluster details are : https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/10253/ (yulienv) [ypersky@ypersky ocs-ci]$ oc describe storagesystem -A Name: ocs-storagecluster-storagesystem Namespace: openshift-storage Labels: <none> Annotations: <none> API Version: odf.openshift.io/v1alpha1 Kind: StorageSystem Metadata: Creation Timestamp: 2022-02-24T09:50:19Z Finalizers: storagesystem.odf.openshift.io Generation: 1 Managed Fields: API Version: odf.openshift.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:spec: .: f:kind: f:name: f:namespace: Manager: Mozilla Operation: Update Time: 2022-02-24T09:50:19Z API Version: odf.openshift.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:finalizers: .: v:"storagesystem.odf.openshift.io": Manager: manager Operation: Update Time: 2022-02-24T09:50:19Z API Version: odf.openshift.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:status: .: f:conditions: Manager: manager Operation: Update Subresource: status Time: 2022-02-24T09:50:19Z Resource Version: 511934 UID: a7edcf27-4e4b-4035-8426-f3d0eb007c8e Spec: Kind: storagecluster.ocs.openshift.io/v1 Name: ocs-storagecluster Namespace: openshift-storage Status: Conditions: Last Heartbeat Time: 2022-02-24T09:50:19Z Last Transition Time: 2022-02-24T09:50:19Z Message: Reconcile is completed successfully Reason: ReconcileCompleted Status: True Type: Available Last Heartbeat Time: 2022-02-24T09:50:19Z Last Transition Time: 2022-02-24T09:50:19Z Message: Reconcile is completed successfully Reason: ReconcileCompleted Status: False Type: Progressing Last Heartbeat Time: 2022-02-24T09:50:19Z Last Transition Time: 2022-02-24T09:50:19Z Message: StorageSystem CR is valid Reason: Valid Status: False Type: StorageSystemInvalid Last Heartbeat Time: 2022-02-24T09:50:19Z Last Transition Time: 2022-02-24T09:50:19Z Reason: Ready Status: True Type: VendorCsvReady Last Heartbeat Time: 2022-02-24T09:50:19Z Last Transition Time: 2022-02-24T09:50:19Z Reason: Found Status: True Type: VendorSystemPresent Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning ReconcileFailed 3m57s StorageSystem controller StorageCluster.ocs.openshift.io "ocs-storagecluster" not found (yulienv) [ypersky@ypersky ocs-ci]$ PLease advise.
Must gather logs are available here ( if needed in the future) : rhsqe-repo.lab.eng.blr.redhat.com:/var/www/html/OCS/ocs-qe-bugs/bz-2056697/logs-20220224-115534/
Nitin has mentioned in #comment12 that the final status is good, I am not sure why this BZ is reopened. Please provide a justification. If it was opened because we got a failure event then that was just an intermittent error which got rectified by the operator automatically.
Same here - odf-csi-addons-operator could not be installed in disconnected cluster. ``` # oc get subscriptions NAME PACKAGE SOURCE CHANNEL mcg-operator-stable-4.10-redhat-operator-index-openshift-marketplace mcg-operator redhat-operator-index stable-4.10 ocs-operator-stable-4.10-redhat-operator-index-openshift-marketplace ocs-operator redhat-operator-index stable-4.10 odf-csi-addons-operator odf-csi-addons-operator redhat-operators stable-4.10 odf-operator odf-operator redhat-operator-index stable-4.10 ``` CatalogSource with name `redhat-operator-index` is the name of sources in private registry. All operators installed from it, but not odf-csi-addons-operator. Source `redhat-operators` is disabled in my cluster. I can install that operator in the "connected" cluster. I fixed the problem with: ``` oc patch subscription/odf-csi-addons-operator --type merge -p '{"spec":{"source":"redhat-operator-index"}}' ```
Hi Nitin, Thank you for your reply. Yes I totally understand we can either change the subscription source, or we can rename the catalogSource. But for customers who use disconnected environment, they might hit this problem 100% because for opm or oc-mirror, which are used to create disconnected CatalogSource, will create a CatalogSource CR with a default name "redhat-operator-index", not "redhat-operators". That's the reason for what I have mentioned, which is, we need either add a warning or note in the disconnected ODF docs, or we try to restructure the CSV of ODF so that the subscription source doesn't have to be a hard-coded name "redhat-operators". Otherwise the customers who use disconnected ODF will hit this problem 100% if they follow our official docs to setup the disconnected OperatorHub.
*** Bug 2066211 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.11.0 security, enhancement, & bugfix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:6156