Bug 2000143 - OCS 4.8 to ODF 4.9 upgrade failed on OCP 4.9 AWS cluster
Summary: OCS 4.8 to ODF 4.9 upgrade failed on OCP 4.9 AWS cluster
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: odf-operator
Version: 4.9
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ODF 4.9.0
Assignee: Jose A. Rivera
QA Contact: Petr Balogh
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-01 13:59 UTC by Aman Agrawal
Modified: 2023-08-09 17:00 UTC (History)
13 users (show)

Fixed In Version: v4.9.0-182.ci
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-12-13 17:45:30 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage odf-operator pull 102 0 None open Bug 2000143: [release-4.9] Manage dependency Subscriptions directly 2021-09-29 14:08:18 UTC
Github red-hat-storage odf-operator pull 99 0 None Merged Manage dependency Subscriptions directly 2021-09-29 12:59:27 UTC
Red Hat Product Errata RHSA-2021:5086 0 None None None 2021-12-13 17:45:43 UTC

Comment 3 Elad 2021-09-01 14:10:31 UTC
Marking as a regression due to the fact that product upgrade used to work :)

Comment 4 Nitin Goyal 2021-09-01 14:14:53 UTC
populating some info which might be useful for someone who does not have access to the setup

$ oc get csv
NAME                         DISPLAY                       VERSION        REPLACES   PHASE
ocs-operator.v4.8.1-177.ci   OpenShift Container Storage   4.8.1-177.ci              Succeeded


$ oc get packagemanifests.packages.operators.coreos.com | grep 'ocs\|odf\|noobaa'
odf-operator                                         OpenShift Data Foundation     6h18m
odf-multicluster-orchestrator                        OpenShift Data Foundation     6h18m
noobaa-operator                                      OpenShift Data Foundation     6h18m
ocs-operator                                         Red Hat Operators             8h
ocs-operator                                         Openshift Container Storage   7h47m
ocs-operator                                         OpenShift Data Foundation     6h18m


$ oc get subscriptions.operators.coreos.com 
NAME           PACKAGE        SOURCE              CHANNEL
ocs-operator   ocs-operator   ocs-catalogsource   stable-4.8
odf-operator   odf-operator   odf-catalogsource   stable-4.9


$ oc describe subscriptions.operators.coreos.com odf-operator
Spec:
  Channel:                stable-4.9
  Install Plan Approval:  Automatic
  Name:                   odf-operator
  Source:                 odf-catalogsource
  Source Namespace:       openshift-marketplace
  Starting CSV:           odf-operator.v4.9.0-120.ci
Status:
  Conditions:
    Last Transition Time:  2021-09-01T07:54:12Z
    Message:               all available catalogsources are healthy
    Reason:                AllCatalogSourcesHealthy
    Status:                False
    Type:                  CatalogSourcesUnhealthy
    Message:               constraints not satisfiable: subscription odf-operator requires odf-catalogsource/openshift-marketplace/stable-4.9/odf-operator.v4.9.0-120.ci, subscription odf-operator exists, subscription ocs-operator exists, subscription ocs-operator requires @existing/openshift-storage//ocs-operator.v4.8.1-177.ci, redhat-operators/openshift-marketplace/stable-4.8/ocs-operator.v4.8.1, @existing/openshift-storage//ocs-operator.v4.8.1-177.ci, ocs-catalogsource/openshift-marketplace/stable-4.8/ocs-operator.v4.8.1-177.ci, odf-catalogsource/openshift-marketplace/stable-4.9/ocs-operator.v4.9.0-120.ci and redhat-operators/openshift-marketplace/stable-4.8/ocs-operator.v4.8.0 provide VolumeReplication (replication.storage.openshift.io/v1alpha1), bundle odf-operator.v4.9.0-120.ci requires an operator with package: ocs-operator and with version in range: 4.9.0-120.ci
    Reason:                ConstraintsNotSatisfiable
    Status:                True
    Type:                  ResolutionFailed


$ oc describe subscriptions.operators.coreos.com ocs-operator
Spec:
  Channel:           stable-4.8
  Name:              ocs-operator
  Source:            ocs-catalogsource
  Source Namespace:  openshift-marketplace
Status:
  Conditions:
    Last Transition Time:   2021-09-01T07:54:45Z
    Message:                all available catalogsources are healthy
    Reason:                 AllCatalogSourcesHealthy
    Status:                 False
    Type:                   CatalogSourcesUnhealthy
    Message:                constraints not satisfiable: subscription odf-operator exists, subscription odf-operator requires odf-catalogsource/openshift-marketplace/stable-4.9/odf-operator.v4.9.0-120.ci, bundle odf-operator.v4.9.0-120.ci requires an operator with package: ocs-operator and with version in range: 4.9.0-120.ci, subscription ocs-operator requires @existing/openshift-storage//ocs-operator.v4.8.1-177.ci, subscription ocs-operator exists, ocs-catalogsource/openshift-marketplace/stable-4.8/ocs-operator.v4.8.1-177.ci, odf-catalogsource/openshift-marketplace/stable-4.9/ocs-operator.v4.9.0-120.ci, redhat-operators/openshift-marketplace/stable-4.8/ocs-operator.v4.8.0, redhat-operators/openshift-marketplace/stable-4.8/ocs-operator.v4.8.1 and @existing/openshift-storage//ocs-operator.v4.8.1-177.ci provide VolumeReplicationClass (replication.storage.openshift.io/v1alpha1)
    Reason:                 ConstraintsNotSatisfiable
    Status:                 True
    Type:                   ResolutionFailed
  Current CSV:              ocs-operator.v4.8.1-177.ci

Comment 5 umanga 2021-09-08 06:11:58 UTC
My observation on this issue was that, you can not install ODF 4.9 on clusters that already have OCS 4.8 installed.
To do so, we need to upgrade OCS 4.8 to 4.9 and then install ODF 4.9 (other workarounds require code & build changes).

The issue is due to conflicting dependency.
User subscribes to OCS 4.8 and OLM tries to satisfy that requirement.
Then User tries to install ODF 4.9 which requires OCS 4.9.
So, now OLM is confused with which user request to satisfy.
It can't automatically upgrade OCS 4.8 to 4.9 because user explicitly installed 4.8.
So install of ODF 4.9 hangs.

Looks like automatic upgrade from OCS 4.8 to 4.9 will not be possible.
It'll be a 2 step manual process:

1. Upgrade OCS 4.8 to 4.9
2. Install ODF 4.9

Comment 6 Nitin Goyal 2021-09-08 06:47:37 UTC
There is another approach also which QE can give a try and let us know the results.

1. Install ODF 4.9 via adding an ODF subscription (It will be pending).
2. Change OCS subscription 4.8 to 4.9

I am suggesting this approach because of the noobaa dependency which is removed from the ocs operator and added to the ODF operator.

We should try out both the approaches and observe the behaviour.

Comment 12 Jose A. Rivera 2021-09-13 16:20:15 UTC
It's certainly not okay that this is still in NEW. I'll be tackling this today, since it'll also involve some changes to our upstream automation. I sure as heck am not going to try and repeatedly test this manually.

Comment 16 Nitin Goyal 2021-09-17 09:49:55 UTC
@amagrawa Can you also paste `oc get subscriptions`

Comment 18 Nitin Goyal 2021-09-17 09:57:37 UTC
looks good to me.

Comment 25 Petr Balogh 2021-10-11 12:16:15 UTC
PR to remove current WA in ocs-ci for upgrade is here:
https://github.com/red-hat-storage/ocs-ci/pull/4945/files

Trying to verify the BZ here:
https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-trigger-aws-ipi-3az-rhcos-3m-3w-upgrade-ocs-auto/85/

Here I am deploying the cluster and will try also UI flow as job will pause before upgrade and I will continue manually:
https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-trigger-aws-ipi-3az-rhcos-3m-3w-upgrade-ocs-auto/86/

Comment 26 Petr Balogh 2021-10-11 15:18:23 UTC
I've tried to do UI upgrade to see how the upgrade behaves now.Opened also chat thread here: https://chat.google.com/room/AAAAEDRLC3U/34bBUka8mig

Summary:

I installed OCS 4.8.2 on top of OCP 4.9
Disabled default sources
Created custom catalog source with redhat-operators name
Subscribed to ODF 4.9
It asked me that storageSystem is required and first the button was inactive (grayed out)
after about 1 minute it became active and I could click to it and starting creating StorageSystem like in new installation.
I would expect that this step will not be allowed to me and storageSystem to be automatically created when StorageCluster already exists from OCS 4.8 installation

$ oc get csv -n openshift-storage
NAME                  DISPLAY                       VERSION   REPLACES   PHASE
ocs-operator.v4.8.2   OpenShift Container Storage   4.8.2                Succeeded
odf-operator.v4.9.0   OpenShift Data Foundation     4.9.0                Succeeded

OCS is still 4.8.2 - I didn't finish with the wizard for creation of StorageSystem as it doesn't make any sense to continue like on fresh deployment when I already have storageCluster created.
Can someone please explain to me how this is going to be resolved?

I will attach screenshots of the flow performed above.

Comment 29 Petr Balogh 2021-10-14 12:57:55 UTC
Looks like that storage system is now created ok.

Only one thing is that it is still showing out the button to user that they can create the storage system but if user click to that , it only allow create one for IBM Flash . So I think it's OK.

Did whole testing from UI.

In background I checked from CLI what is happening:

pbalogh@pbalogh-mac bug-storageCluster $ oc get subscription -n openshift-storage
NAME           PACKAGE        SOURCE             CHANNEL
ocs-operator   ocs-operator   redhat-operators   stable-4.8

pbalogh@pbalogh-mac bug-storageCluster $ oc get csv -n openshift-storage
NAME                     DISPLAY                       VERSION   REPLACES              PHASE
noobaa-operator.v4.9.0   NooBaa Operator               4.9.0                           Pending
ocs-operator.v4.8.2      OpenShift Container Storage   4.8.2                           Replacing
ocs-operator.v4.9.0      OpenShift Container Storage   4.9.0     ocs-operator.v4.8.2   Pending
odf-operator.v4.9.0      OpenShift Data Foundation     4.9.0        
                   Succeeded
pbalogh@pbalogh-mac bug-storageCluster $ oc get pod -n openshift-storage
NAME                                                              READY   STATUS      RESTARTS   AGE
csi-cephfsplugin-jv7bz                                            3/3     Running     0          20m
csi-cephfsplugin-provisioner-779b49799d-cjnhb                     6/6     Running     0          20m
csi-cephfsplugin-provisioner-779b49799d-vs4r2                     6/6     Running     0          20m
csi-cephfsplugin-vxcsd                                            3/3     Running     0          20m
csi-cephfsplugin-wng9f                                            3/3     Running     0          20m
csi-rbdplugin-provisioner-859c66d84c-4sgbz                        6/6     Running     0          20m
csi-rbdplugin-provisioner-859c66d84c-99mgr                        6/6     Running     0          20m
csi-rbdplugin-t747g                                               3/3     Running     0          20m
csi-rbdplugin-wm7d2                                               3/3     Running     0          20m
csi-rbdplugin-zt4hk                                               3/3     Running     0          20m
noobaa-core-0                                                     1/1     Running     0          9m32s
noobaa-db-pg-0                                                    1/1     Running     0          9m32s
noobaa-endpoint-9d78b8765-56wp4                                   1/1     Running     0          8m6s
noobaa-operator-7dd7947864-2slcx                                  1/1     Running     0          22m
ocs-metrics-exporter-6c7c475cb7-q9xnf                             1/1     Running     0          22m
ocs-operator-5997857669-clwk8                                     1/1     Running     0          22m
odf-console-86f754777d-qg78p                                      1/1     Running     0          43s
odf-operator-controller-manager-8998f8c96-l6nrk                   2/2     Running     0          43s
rook-ceph-crashcollector-04eaaebff1e6c46ea57254eec81feaec-nqmbs   1/1     Running     0          10m
rook-ceph-crashcollector-666fac79ec87e8d94ce46ce74ba6005f-ncpcf   1/1     Running     0          10m
rook-ceph-crashcollector-d5c6ccdcec1a0c0065186ee8bb5cd245-jdssk   1/1     Running     0          9m33s
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-5db988fd5nltj   2/2     Running     0          9m12s
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-7f6756bbw97t4   2/2     Running     0          9m11s
rook-ceph-mgr-a-5ddd548656-zwvlq                                  2/2     Running     0          10m
rook-ceph-mon-a-f464b9b86-djl2r                                   2/2     Running     0          19m
rook-ceph-mon-b-7d6fb4468d-5xmmg                                  2/2     Running     0          15m
rook-ceph-mon-c-d74fbf9bb-swgmw                                   2/2     Running     0          12m
rook-ceph-operator-5b4fccd558-848gh                               1/1     Running     0          22m
rook-ceph-osd-0-66978fd5bd-st8sx                                  2/2     Running     0          9m51s
rook-ceph-osd-1-68d96b456-9wmm7                                   2/2     Running     0          9m39s
rook-ceph-osd-2-858d786567-dm2km                                  2/2     Running     0          9m33s
rook-ceph-osd-prepare-ocs-deviceset-gp2-0-data-0d66wf--1-5hp9t    0/1     Completed   0          10m
rook-ceph-osd-prepare-ocs-deviceset-gp2-1-data-0k9fg5--1-l66dm    0/1     Completed   0          10m
rook-ceph-osd-prepare-ocs-deviceset-gp2-2-data-0nzfvs--1-mjd7n    0/1     Completed   0          10m


pbalogh@pbalogh-mac bug-storageCluster $ oc get csv -n openshift-storage
NAME                     DISPLAY                       VERSION   REPLACES              PHASE
noobaa-operator.v4.9.0   NooBaa Operator               4.9.0                           Succeeded
ocs-operator.v4.9.0      OpenShift Container Storage   4.9.0     ocs-operator.v4.8.2   Succeeded
odf-operator.v4.9.0      OpenShift Data Foundation     4.9.0                           Succeeded


# You can see storage system was automatically created:

pbalogh@pbalogh-mac bug-storageCluster $ oc get storageSystem -n openshift-storage
NAME                               STORAGE-SYSTEM-KIND                  STORAGE-SYSTEM-NAME
ocs-storagecluster-storagesystem   storagecluster.ocs.openshift.io/v1   ocs-storagecluster


pbalogh@pbalogh-mac bug-storageCluster $ oc get pod -n openshift-storage
NAME                                                              READY   STATUS        RESTARTS   AGE
csi-cephfsplugin-6t8pg                                            3/3     Running       0          2m13s
csi-cephfsplugin-d52cr                                            3/3     Running       0          96s
csi-cephfsplugin-provisioner-5c576d45fd-g7xvp                     6/6     Running       0          2m11s
csi-cephfsplugin-provisioner-5c576d45fd-pdwh9                     6/6     Running       0          2m11s
csi-cephfsplugin-qxd7t                                            3/3     Running       0          2m6s
csi-rbdplugin-44mfq                                               3/3     Running       0          99s
csi-rbdplugin-gvcfh                                               3/3     Running       0          118s
csi-rbdplugin-provisioner-65cffcfcc6-kckt6                        6/6     Running       0          2m14s
csi-rbdplugin-provisioner-65cffcfcc6-vh7s7                        6/6     Running       0          2m14s
csi-rbdplugin-zjtgz                                               3/3     Running       0          2m17s
noobaa-core-0                                                     1/1     Running       0          55s
noobaa-db-pg-0                                                    1/1     Running       0          84s
noobaa-endpoint-6785755654-nlhq7                                  1/1     Terminating   0          2m9s
noobaa-endpoint-7dff67f58-dmwfl                                   1/1     Running       0          28s
noobaa-operator-67cb9f49d5-96sjs                                  1/1     Running       0          2m19s
ocs-metrics-exporter-967cf6678-5dsch                              1/1     Running       0          2m35s
ocs-operator-5f8f466f96-lq2j9                                     1/1     Running       0          2m34s
odf-console-86f754777d-qg78p                                      1/1     Running       0          3m26s
odf-operator-controller-manager-8998f8c96-l6nrk                   2/2     Running       0          3m26s
rook-ceph-crashcollector-04eaaebff1e6c46ea57254eec81feaec-fgfv2   1/1     Running       0          2m28s
rook-ceph-crashcollector-666fac79ec87e8d94ce46ce74ba6005f-kc5kj   1/1     Running       0          2m28s
rook-ceph-crashcollector-d5c6ccdcec1a0c0065186ee8bb5cd245-rsx89   1/1     Running       0          2m28s
rook-ceph-detect-version--1-g5qvn                                 0/1     Init:0/1      0          3s
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-d5949b56s5sz5   2/2     Running       0          44s
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-6c9d847b85qkx   2/2     Running       0          34s
rook-ceph-mgr-a-5ddd548656-zwvlq                                  2/2     Running       0          13m
rook-ceph-mon-a-69756fd649-cjp75                                  2/2     Running       0          2m5s
rook-ceph-mon-b-75b64c9b7-kj555                                   2/2     Running       0          97s
rook-ceph-mon-c-849fc857df-9mgrk                                  2/2     Running       0          27s
rook-ceph-operator-5688f5b8d-j5tfp                                1/1     Running       0          2m35s
rook-ceph-osd-0-66978fd5bd-st8sx                                  2/2     Running       0          12m
rook-ceph-osd-1-68d96b456-9wmm7                                   2/2     Running       0          12m
rook-ceph-osd-2-858d786567-dm2km                                  2/2     Running       0          12m
rook-ceph-osd-prepare-ocs-deviceset-gp2-0-data-0d66wf--1-5hp9t    0/1     Completed     0          12m
rook-ceph-osd-prepare-ocs-deviceset-gp2-1-data-0k9fg5--1-l66dm    0/1     Completed     0          12m
rook-ceph-osd-prepare-ocs-deviceset-gp2-2-data-0nzfvs--1-mjd7n    0/1     Completed     0          12m


I also did record whole upgrade, so once I will get link to recording I can share if you want to see how it went.

Marking as verified.

Comment 32 errata-xmlrpc 2021-12-13 17:45:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Data Foundation 4.9.0 enhancement, security, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:5086


Note You need to log in before you can comment on or make changes to this bug.