2034098 – [ODF workaround for OLM BZ 2035484] After OCS upgrade from 4.8 to ODF 4.9.0 or 4.9.1, storagesystem is not ready

Bug 2034098 - [ODF workaround for OLM BZ 2035484] After OCS upgrade from 4.8 to ODF 4.9.0 or 4.9.1, storagesystem is not ready

Summary: [ODF workaround for OLM BZ 2035484] After OCS upgrade from 4.8 to ODF 4.9.0 o...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	odf-operator
Sub Component:
Version:	4.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	ODF 4.9.1
Assignee:	Nitin Goyal
QA Contact:	Vijay Avuthu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-12-20 06:17 UTC by Vijay Avuthu
Modified:	2023-12-08 04:27 UTC (History)
CC List:	12 users (show)
Fixed In Version:	4.9.1-257.ci
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-01-05 16:24:05 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	red-hat-storage odf-operator pull 158	None	Merged	controllers: reconcile noobaa subscription before ocs-operator	2021-12-23 20:10:42 UTC
Github	red-hat-storage odf-operator pull 159	None	Merged	Bug 2034098:[release-4.9] controllers: reconcile noobaa subscription before ocs-operator	2021-12-21 13:47:10 UTC
Red Hat Product Errata	RHBA-2022:0032	None	None	None	2022-01-05 16:24:08 UTC

Description Vijay Avuthu 2021-12-20 06:17:11 UTC

Description of problem (please be detailed as possible and provide log
snippests):

After upgrade from 4.8 to 4.9.1, storagesystem is not ready


Version of all relevant components (if applicable):

upgrade from : 4.8.6
upgrade to image: quay.io/rhceph-dev/ocs-registry:4.9.1-252.ci


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes

Is there any workaround available to the best of your knowledge?
NA

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Not tried

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. upgrade ODF from 4.8 to 4.9
2. check storagesystem status
3.


Actual results:

storage system is not in expected state


Expected results:

storage system should be in ready state


Additional info:

storage system state:

$ oc get storagesystem -o yaml
apiVersion: v1
items:
- apiVersion: odf.openshift.io/v1alpha1
  kind: StorageSystem
  metadata:
    creationTimestamp: "2021-12-18T00:58:29Z"
    finalizers:
    - storagesystem.odf.openshift.io
    generation: 1
    managedFields:
    - apiVersion: odf.openshift.io/v1alpha1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:finalizers:
            .: {}
            v:"storagesystem.odf.openshift.io": {}
        f:spec:
          .: {}
          f:kind: {}
          f:name: {}
          f:namespace: {}
      manager: manager
      operation: Update
      time: "2021-12-18T00:58:29Z"
    - apiVersion: odf.openshift.io/v1alpha1
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          .: {}
          f:conditions: {}
      manager: manager
      operation: Update
      subresource: status
      time: "2021-12-18T00:58:29Z"
    name: ocs-storagecluster-storagesystem
    namespace: openshift-storage
    resourceVersion: "2287583"
    uid: 2902a8fe-0e6f-4a57-9171-92a012becb23
  spec:
    kind: storagecluster.ocs.openshift.io/v1
    name: ocs-storagecluster
    namespace: openshift-storage
  status:
    conditions:
    - lastHeartbeatTime: "2021-12-20T06:07:03Z"
      lastTransitionTime: "2021-12-18T00:58:29Z"
      message: Reconcile is in progress
      reason: Reconciling
      status: "False"
      type: Available
    - lastHeartbeatTime: "2021-12-20T06:07:03Z"
      lastTransitionTime: "2021-12-18T00:58:29Z"
      message: Reconcile is in progress
      reason: Reconciling
      status: "True"
      type: Progressing
    - lastHeartbeatTime: "2021-12-20T06:07:03Z"
      lastTransitionTime: "2021-12-18T00:58:29Z"
      message: StorageSystem CR is valid
      reason: Valid
      status: "False"
      type: StorageSystemInvalid
    - lastHeartbeatTime: "2021-12-18T00:58:39Z"
      lastTransitionTime: "2021-12-18T00:58:29Z"
      message: ClusterServiceVersion.operators.coreos.com "mcg-operator.v4.9.1" not found; ClusterServiceVersion.operators.coreos.com "ocs-operator.v4.9.1" not found
      reason: NotReady
      status: "False"
      type: VendorCsvReady
    - lastHeartbeatTime: "2021-12-18T00:58:29Z"
      lastTransitionTime: "2021-12-18T00:58:29Z"
      message: Initializing StorageSystem
      reason: Init
      status: Unknown
      type: VendorSystemPresent
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
$ 

> All csv are in succeeded state
$ oc get csv
NAME                  DISPLAY                       VERSION   REPLACES              PHASE
mcg-operator.v4.9.1   NooBaa Operator               4.9.1     mcg-operator.v4.9.0   Succeeded
ocs-operator.v4.9.1   OpenShift Container Storage   4.9.1     ocs-operator.v4.8.6   Succeeded
odf-operator.v4.9.1   OpenShift Data Foundation     4.9.1     odf-operator.v4.9.0   Succeeded

> Job: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/2647//consoleFull

Must gather: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-124vu1cs33-ua/j-124vu1cs33-ua_20211217T233900/logs/failed_testcase_ocs_logs_1639786979/test_upgrade_ocs_logs/

Comment 3 Vijay Avuthu 2021-12-20 06:24:48 UTC

cluster is alive for debugging if needed

Comment 4 Nitin Goyal 2021-12-20 07:35:17 UTC

I checked the cluster and found that there are multiple Subscriptions for package 'mcg-operator'.

$ oc get subscriptions.operators.coreos.com 
NAME                                                             PACKAGE        SOURCE             CHANNEL
mcg-operator                                                     mcg-operator   redhat-operators   stable-4.9
mcg-operator-stable-4.9-redhat-operators-openshift-marketplace   mcg-operator   redhat-operators   stable-4.9
ocs-operator                                                     ocs-operator   redhat-operators   stable-4.9
odf-operator                                                     odf-operator   redhat-operators   stable-4.9

It is the same issue that we fixed earlier for ocs-operator subscription BZ#2014034.

It is really difficult to fix the issue in the odf-operator at this point in time and backport it. This is a one-time problem while upgrading from 4.8 to 4.9. We can mark this as a known issue and a workaround is very simple to get out of this situation is just deleting one subscription out of two (I would prefer to delete mcg-operator-stable-4.9-redhat-operators-openshift-marketplace).

Comment 18 Elad 2021-12-23 19:47:01 UTC

Here is what we know so far:

- It looks like the bug started with OCP 4.9.11. 
- The impact is on customers who have OCP >= 4.9.11 and OCS 4.8 installed, and will try to upgrade OCS to ODF 4.9.0

Comment 19 Elad 2021-12-23 20:28:45 UTC

Diff between OCP 4.9.10 and 4.9.11 - https://amd64.ocp.releases.ci.openshift.org/releasestream/4-stable/release/4.9.11
Out of that list, this one seems suspicious - https://bugzilla.redhat.com/show_bug.cgi?id=2024048

Comment 20 Mudit Agarwal 2021-12-24 03:31:18 UTC

Build is not ready yet

Comment 28 errata-xmlrpc 2022-01-05 16:24:05 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Data Foundation 4.9.1 Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0032

Comment 29 Red Hat Bugzilla 2023-12-08 04:27:11 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days

Note You need to log in before you can comment on or make changes to this bug.