Description of problem (please be detailed as possible and provide log snippests): After upgrade from 4.8 to 4.9.1, storagesystem is not ready Version of all relevant components (if applicable): upgrade from : 4.8.6 upgrade to image: quay.io/rhceph-dev/ocs-registry:4.9.1-252.ci Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes Is there any workaround available to the best of your knowledge? NA Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? Yes Can this issue reproduce from the UI? Not tried If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. upgrade ODF from 4.8 to 4.9 2. check storagesystem status 3. Actual results: storage system is not in expected state Expected results: storage system should be in ready state Additional info: storage system state: $ oc get storagesystem -o yaml apiVersion: v1 items: - apiVersion: odf.openshift.io/v1alpha1 kind: StorageSystem metadata: creationTimestamp: "2021-12-18T00:58:29Z" finalizers: - storagesystem.odf.openshift.io generation: 1 managedFields: - apiVersion: odf.openshift.io/v1alpha1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:finalizers: .: {} v:"storagesystem.odf.openshift.io": {} f:spec: .: {} f:kind: {} f:name: {} f:namespace: {} manager: manager operation: Update time: "2021-12-18T00:58:29Z" - apiVersion: odf.openshift.io/v1alpha1 fieldsType: FieldsV1 fieldsV1: f:status: .: {} f:conditions: {} manager: manager operation: Update subresource: status time: "2021-12-18T00:58:29Z" name: ocs-storagecluster-storagesystem namespace: openshift-storage resourceVersion: "2287583" uid: 2902a8fe-0e6f-4a57-9171-92a012becb23 spec: kind: storagecluster.ocs.openshift.io/v1 name: ocs-storagecluster namespace: openshift-storage status: conditions: - lastHeartbeatTime: "2021-12-20T06:07:03Z" lastTransitionTime: "2021-12-18T00:58:29Z" message: Reconcile is in progress reason: Reconciling status: "False" type: Available - lastHeartbeatTime: "2021-12-20T06:07:03Z" lastTransitionTime: "2021-12-18T00:58:29Z" message: Reconcile is in progress reason: Reconciling status: "True" type: Progressing - lastHeartbeatTime: "2021-12-20T06:07:03Z" lastTransitionTime: "2021-12-18T00:58:29Z" message: StorageSystem CR is valid reason: Valid status: "False" type: StorageSystemInvalid - lastHeartbeatTime: "2021-12-18T00:58:39Z" lastTransitionTime: "2021-12-18T00:58:29Z" message: ClusterServiceVersion.operators.coreos.com "mcg-operator.v4.9.1" not found; ClusterServiceVersion.operators.coreos.com "ocs-operator.v4.9.1" not found reason: NotReady status: "False" type: VendorCsvReady - lastHeartbeatTime: "2021-12-18T00:58:29Z" lastTransitionTime: "2021-12-18T00:58:29Z" message: Initializing StorageSystem reason: Init status: Unknown type: VendorSystemPresent kind: List metadata: resourceVersion: "" selfLink: "" $ > All csv are in succeeded state $ oc get csv NAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.9.1 NooBaa Operator 4.9.1 mcg-operator.v4.9.0 Succeeded ocs-operator.v4.9.1 OpenShift Container Storage 4.9.1 ocs-operator.v4.8.6 Succeeded odf-operator.v4.9.1 OpenShift Data Foundation 4.9.1 odf-operator.v4.9.0 Succeeded > Job: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/2647//consoleFull Must gather: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-124vu1cs33-ua/j-124vu1cs33-ua_20211217T233900/logs/failed_testcase_ocs_logs_1639786979/test_upgrade_ocs_logs/
cluster is alive for debugging if needed
I checked the cluster and found that there are multiple Subscriptions for package 'mcg-operator'. $ oc get subscriptions.operators.coreos.com NAME PACKAGE SOURCE CHANNEL mcg-operator mcg-operator redhat-operators stable-4.9 mcg-operator-stable-4.9-redhat-operators-openshift-marketplace mcg-operator redhat-operators stable-4.9 ocs-operator ocs-operator redhat-operators stable-4.9 odf-operator odf-operator redhat-operators stable-4.9 It is the same issue that we fixed earlier for ocs-operator subscription BZ#2014034. It is really difficult to fix the issue in the odf-operator at this point in time and backport it. This is a one-time problem while upgrading from 4.8 to 4.9. We can mark this as a known issue and a workaround is very simple to get out of this situation is just deleting one subscription out of two (I would prefer to delete mcg-operator-stable-4.9-redhat-operators-openshift-marketplace).
Here is what we know so far: - It looks like the bug started with OCP 4.9.11. - The impact is on customers who have OCP >= 4.9.11 and OCS 4.8 installed, and will try to upgrade OCS to ODF 4.9.0
Diff between OCP 4.9.10 and 4.9.11 - https://amd64.ocp.releases.ci.openshift.org/releasestream/4-stable/release/4.9.11 Out of that list, this one seems suspicious - https://bugzilla.redhat.com/show_bug.cgi?id=2024048
Build is not ready yet
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Data Foundation 4.9.1 Bug Fix Update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:0032
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days