Bug 2034098 - [ODF workaround for OLM BZ 2035484] After OCS upgrade from 4.8 to ODF 4.9.0 or 4.9.1, storagesystem is not ready
Summary: [ODF workaround for OLM BZ 2035484] After OCS upgrade from 4.8 to ODF 4.9.0 o...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: odf-operator
Version: 4.9
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ODF 4.9.1
Assignee: Nitin Goyal
QA Contact: Vijay Avuthu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-20 06:17 UTC by Vijay Avuthu
Modified: 2023-12-08 04:27 UTC (History)
12 users (show)

Fixed In Version: 4.9.1-257.ci
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-01-05 16:24:05 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage odf-operator pull 158 0 None Merged controllers: reconcile noobaa subscription before ocs-operator 2021-12-23 20:10:42 UTC
Github red-hat-storage odf-operator pull 159 0 None Merged Bug 2034098:[release-4.9] controllers: reconcile noobaa subscription before ocs-operator 2021-12-21 13:47:10 UTC
Red Hat Product Errata RHBA-2022:0032 0 None None None 2022-01-05 16:24:08 UTC

Description Vijay Avuthu 2021-12-20 06:17:11 UTC
Description of problem (please be detailed as possible and provide log
snippests):

After upgrade from 4.8 to 4.9.1, storagesystem is not ready


Version of all relevant components (if applicable):

upgrade from : 4.8.6
upgrade to image: quay.io/rhceph-dev/ocs-registry:4.9.1-252.ci


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes

Is there any workaround available to the best of your knowledge?
NA

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Not tried

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. upgrade ODF from 4.8 to 4.9
2. check storagesystem status
3.


Actual results:

storage system is not in expected state


Expected results:

storage system should be in ready state


Additional info:

storage system state:

$ oc get storagesystem -o yaml
apiVersion: v1
items:
- apiVersion: odf.openshift.io/v1alpha1
  kind: StorageSystem
  metadata:
    creationTimestamp: "2021-12-18T00:58:29Z"
    finalizers:
    - storagesystem.odf.openshift.io
    generation: 1
    managedFields:
    - apiVersion: odf.openshift.io/v1alpha1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:finalizers:
            .: {}
            v:"storagesystem.odf.openshift.io": {}
        f:spec:
          .: {}
          f:kind: {}
          f:name: {}
          f:namespace: {}
      manager: manager
      operation: Update
      time: "2021-12-18T00:58:29Z"
    - apiVersion: odf.openshift.io/v1alpha1
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          .: {}
          f:conditions: {}
      manager: manager
      operation: Update
      subresource: status
      time: "2021-12-18T00:58:29Z"
    name: ocs-storagecluster-storagesystem
    namespace: openshift-storage
    resourceVersion: "2287583"
    uid: 2902a8fe-0e6f-4a57-9171-92a012becb23
  spec:
    kind: storagecluster.ocs.openshift.io/v1
    name: ocs-storagecluster
    namespace: openshift-storage
  status:
    conditions:
    - lastHeartbeatTime: "2021-12-20T06:07:03Z"
      lastTransitionTime: "2021-12-18T00:58:29Z"
      message: Reconcile is in progress
      reason: Reconciling
      status: "False"
      type: Available
    - lastHeartbeatTime: "2021-12-20T06:07:03Z"
      lastTransitionTime: "2021-12-18T00:58:29Z"
      message: Reconcile is in progress
      reason: Reconciling
      status: "True"
      type: Progressing
    - lastHeartbeatTime: "2021-12-20T06:07:03Z"
      lastTransitionTime: "2021-12-18T00:58:29Z"
      message: StorageSystem CR is valid
      reason: Valid
      status: "False"
      type: StorageSystemInvalid
    - lastHeartbeatTime: "2021-12-18T00:58:39Z"
      lastTransitionTime: "2021-12-18T00:58:29Z"
      message: ClusterServiceVersion.operators.coreos.com "mcg-operator.v4.9.1" not found; ClusterServiceVersion.operators.coreos.com "ocs-operator.v4.9.1" not found
      reason: NotReady
      status: "False"
      type: VendorCsvReady
    - lastHeartbeatTime: "2021-12-18T00:58:29Z"
      lastTransitionTime: "2021-12-18T00:58:29Z"
      message: Initializing StorageSystem
      reason: Init
      status: Unknown
      type: VendorSystemPresent
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
$ 

> All csv are in succeeded state
$ oc get csv
NAME                  DISPLAY                       VERSION   REPLACES              PHASE
mcg-operator.v4.9.1   NooBaa Operator               4.9.1     mcg-operator.v4.9.0   Succeeded
ocs-operator.v4.9.1   OpenShift Container Storage   4.9.1     ocs-operator.v4.8.6   Succeeded
odf-operator.v4.9.1   OpenShift Data Foundation     4.9.1     odf-operator.v4.9.0   Succeeded

> Job: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/2647//consoleFull

Must gather: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-124vu1cs33-ua/j-124vu1cs33-ua_20211217T233900/logs/failed_testcase_ocs_logs_1639786979/test_upgrade_ocs_logs/

Comment 3 Vijay Avuthu 2021-12-20 06:24:48 UTC
cluster is alive for debugging if needed

Comment 4 Nitin Goyal 2021-12-20 07:35:17 UTC
I checked the cluster and found that there are multiple Subscriptions for package 'mcg-operator'.

$ oc get subscriptions.operators.coreos.com 
NAME                                                             PACKAGE        SOURCE             CHANNEL
mcg-operator                                                     mcg-operator   redhat-operators   stable-4.9
mcg-operator-stable-4.9-redhat-operators-openshift-marketplace   mcg-operator   redhat-operators   stable-4.9
ocs-operator                                                     ocs-operator   redhat-operators   stable-4.9
odf-operator                                                     odf-operator   redhat-operators   stable-4.9

It is the same issue that we fixed earlier for ocs-operator subscription BZ#2014034.

It is really difficult to fix the issue in the odf-operator at this point in time and backport it. This is a one-time problem while upgrading from 4.8 to 4.9. We can mark this as a known issue and a workaround is very simple to get out of this situation is just deleting one subscription out of two (I would prefer to delete mcg-operator-stable-4.9-redhat-operators-openshift-marketplace).

Comment 18 Elad 2021-12-23 19:47:01 UTC
Here is what we know so far:

- It looks like the bug started with OCP 4.9.11. 
- The impact is on customers who have OCP >= 4.9.11 and OCS 4.8 installed, and will try to upgrade OCS to ODF 4.9.0

Comment 19 Elad 2021-12-23 20:28:45 UTC
Diff between OCP 4.9.10 and 4.9.11 - https://amd64.ocp.releases.ci.openshift.org/releasestream/4-stable/release/4.9.11
Out of that list, this one seems suspicious - https://bugzilla.redhat.com/show_bug.cgi?id=2024048

Comment 20 Mudit Agarwal 2021-12-24 03:31:18 UTC
Build is not ready yet

Comment 28 errata-xmlrpc 2022-01-05 16:24:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Data Foundation 4.9.1 Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0032

Comment 29 Red Hat Bugzilla 2023-12-08 04:27:11 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.