Bug 2000143
| Summary: | OCS 4.8 to ODF 4.9 upgrade failed on OCP 4.9 AWS cluster | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Aman Agrawal <amagrawa> |
| Component: | odf-operator | Assignee: | Jose A. Rivera <jarrpa> |
| Status: | CLOSED ERRATA | QA Contact: | Petr Balogh <pbalogh> |
| Severity: | urgent | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.9 | CC: | assingh, ebenahar, jarrpa, jijoy, jopinto, muagarwa, nigoyal, ocs-bugs, odf-bz-bot, pbalogh, rperiyas, uchapaga, vavuthu |
| Target Milestone: | --- | Keywords: | Regression, UpgradeBlocker |
| Target Release: | ODF 4.9.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | v4.9.0-182.ci | Doc Type: | No Doc Update |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-12-13 17:45:30 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Comment 3
Elad
2021-09-01 14:10:31 UTC
populating some info which might be useful for someone who does not have access to the setup
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASE
ocs-operator.v4.8.1-177.ci OpenShift Container Storage 4.8.1-177.ci Succeeded
$ oc get packagemanifests.packages.operators.coreos.com | grep 'ocs\|odf\|noobaa'
odf-operator OpenShift Data Foundation 6h18m
odf-multicluster-orchestrator OpenShift Data Foundation 6h18m
noobaa-operator OpenShift Data Foundation 6h18m
ocs-operator Red Hat Operators 8h
ocs-operator Openshift Container Storage 7h47m
ocs-operator OpenShift Data Foundation 6h18m
$ oc get subscriptions.operators.coreos.com
NAME PACKAGE SOURCE CHANNEL
ocs-operator ocs-operator ocs-catalogsource stable-4.8
odf-operator odf-operator odf-catalogsource stable-4.9
$ oc describe subscriptions.operators.coreos.com odf-operator
Spec:
Channel: stable-4.9
Install Plan Approval: Automatic
Name: odf-operator
Source: odf-catalogsource
Source Namespace: openshift-marketplace
Starting CSV: odf-operator.v4.9.0-120.ci
Status:
Conditions:
Last Transition Time: 2021-09-01T07:54:12Z
Message: all available catalogsources are healthy
Reason: AllCatalogSourcesHealthy
Status: False
Type: CatalogSourcesUnhealthy
Message: constraints not satisfiable: subscription odf-operator requires odf-catalogsource/openshift-marketplace/stable-4.9/odf-operator.v4.9.0-120.ci, subscription odf-operator exists, subscription ocs-operator exists, subscription ocs-operator requires @existing/openshift-storage//ocs-operator.v4.8.1-177.ci, redhat-operators/openshift-marketplace/stable-4.8/ocs-operator.v4.8.1, @existing/openshift-storage//ocs-operator.v4.8.1-177.ci, ocs-catalogsource/openshift-marketplace/stable-4.8/ocs-operator.v4.8.1-177.ci, odf-catalogsource/openshift-marketplace/stable-4.9/ocs-operator.v4.9.0-120.ci and redhat-operators/openshift-marketplace/stable-4.8/ocs-operator.v4.8.0 provide VolumeReplication (replication.storage.openshift.io/v1alpha1), bundle odf-operator.v4.9.0-120.ci requires an operator with package: ocs-operator and with version in range: 4.9.0-120.ci
Reason: ConstraintsNotSatisfiable
Status: True
Type: ResolutionFailed
$ oc describe subscriptions.operators.coreos.com ocs-operator
Spec:
Channel: stable-4.8
Name: ocs-operator
Source: ocs-catalogsource
Source Namespace: openshift-marketplace
Status:
Conditions:
Last Transition Time: 2021-09-01T07:54:45Z
Message: all available catalogsources are healthy
Reason: AllCatalogSourcesHealthy
Status: False
Type: CatalogSourcesUnhealthy
Message: constraints not satisfiable: subscription odf-operator exists, subscription odf-operator requires odf-catalogsource/openshift-marketplace/stable-4.9/odf-operator.v4.9.0-120.ci, bundle odf-operator.v4.9.0-120.ci requires an operator with package: ocs-operator and with version in range: 4.9.0-120.ci, subscription ocs-operator requires @existing/openshift-storage//ocs-operator.v4.8.1-177.ci, subscription ocs-operator exists, ocs-catalogsource/openshift-marketplace/stable-4.8/ocs-operator.v4.8.1-177.ci, odf-catalogsource/openshift-marketplace/stable-4.9/ocs-operator.v4.9.0-120.ci, redhat-operators/openshift-marketplace/stable-4.8/ocs-operator.v4.8.0, redhat-operators/openshift-marketplace/stable-4.8/ocs-operator.v4.8.1 and @existing/openshift-storage//ocs-operator.v4.8.1-177.ci provide VolumeReplicationClass (replication.storage.openshift.io/v1alpha1)
Reason: ConstraintsNotSatisfiable
Status: True
Type: ResolutionFailed
Current CSV: ocs-operator.v4.8.1-177.ci
My observation on this issue was that, you can not install ODF 4.9 on clusters that already have OCS 4.8 installed. To do so, we need to upgrade OCS 4.8 to 4.9 and then install ODF 4.9 (other workarounds require code & build changes). The issue is due to conflicting dependency. User subscribes to OCS 4.8 and OLM tries to satisfy that requirement. Then User tries to install ODF 4.9 which requires OCS 4.9. So, now OLM is confused with which user request to satisfy. It can't automatically upgrade OCS 4.8 to 4.9 because user explicitly installed 4.8. So install of ODF 4.9 hangs. Looks like automatic upgrade from OCS 4.8 to 4.9 will not be possible. It'll be a 2 step manual process: 1. Upgrade OCS 4.8 to 4.9 2. Install ODF 4.9 There is another approach also which QE can give a try and let us know the results. 1. Install ODF 4.9 via adding an ODF subscription (It will be pending). 2. Change OCS subscription 4.8 to 4.9 I am suggesting this approach because of the noobaa dependency which is removed from the ocs operator and added to the ODF operator. We should try out both the approaches and observe the behaviour. It's certainly not okay that this is still in NEW. I'll be tackling this today, since it'll also involve some changes to our upstream automation. I sure as heck am not going to try and repeatedly test this manually. @amagrawa Can you also paste `oc get subscriptions` looks good to me. PR to remove current WA in ocs-ci for upgrade is here: https://github.com/red-hat-storage/ocs-ci/pull/4945/files Trying to verify the BZ here: https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-trigger-aws-ipi-3az-rhcos-3m-3w-upgrade-ocs-auto/85/ Here I am deploying the cluster and will try also UI flow as job will pause before upgrade and I will continue manually: https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-trigger-aws-ipi-3az-rhcos-3m-3w-upgrade-ocs-auto/86/ I've tried to do UI upgrade to see how the upgrade behaves now.Opened also chat thread here: https://chat.google.com/room/AAAAEDRLC3U/34bBUka8mig Summary: I installed OCS 4.8.2 on top of OCP 4.9 Disabled default sources Created custom catalog source with redhat-operators name Subscribed to ODF 4.9 It asked me that storageSystem is required and first the button was inactive (grayed out) after about 1 minute it became active and I could click to it and starting creating StorageSystem like in new installation. I would expect that this step will not be allowed to me and storageSystem to be automatically created when StorageCluster already exists from OCS 4.8 installation $ oc get csv -n openshift-storage NAME DISPLAY VERSION REPLACES PHASE ocs-operator.v4.8.2 OpenShift Container Storage 4.8.2 Succeeded odf-operator.v4.9.0 OpenShift Data Foundation 4.9.0 Succeeded OCS is still 4.8.2 - I didn't finish with the wizard for creation of StorageSystem as it doesn't make any sense to continue like on fresh deployment when I already have storageCluster created. Can someone please explain to me how this is going to be resolved? I will attach screenshots of the flow performed above. Looks like that storage system is now created ok.
Only one thing is that it is still showing out the button to user that they can create the storage system but if user click to that , it only allow create one for IBM Flash . So I think it's OK.
Did whole testing from UI.
In background I checked from CLI what is happening:
pbalogh@pbalogh-mac bug-storageCluster $ oc get subscription -n openshift-storage
NAME PACKAGE SOURCE CHANNEL
ocs-operator ocs-operator redhat-operators stable-4.8
pbalogh@pbalogh-mac bug-storageCluster $ oc get csv -n openshift-storage
NAME DISPLAY VERSION REPLACES PHASE
noobaa-operator.v4.9.0 NooBaa Operator 4.9.0 Pending
ocs-operator.v4.8.2 OpenShift Container Storage 4.8.2 Replacing
ocs-operator.v4.9.0 OpenShift Container Storage 4.9.0 ocs-operator.v4.8.2 Pending
odf-operator.v4.9.0 OpenShift Data Foundation 4.9.0
Succeeded
pbalogh@pbalogh-mac bug-storageCluster $ oc get pod -n openshift-storage
NAME READY STATUS RESTARTS AGE
csi-cephfsplugin-jv7bz 3/3 Running 0 20m
csi-cephfsplugin-provisioner-779b49799d-cjnhb 6/6 Running 0 20m
csi-cephfsplugin-provisioner-779b49799d-vs4r2 6/6 Running 0 20m
csi-cephfsplugin-vxcsd 3/3 Running 0 20m
csi-cephfsplugin-wng9f 3/3 Running 0 20m
csi-rbdplugin-provisioner-859c66d84c-4sgbz 6/6 Running 0 20m
csi-rbdplugin-provisioner-859c66d84c-99mgr 6/6 Running 0 20m
csi-rbdplugin-t747g 3/3 Running 0 20m
csi-rbdplugin-wm7d2 3/3 Running 0 20m
csi-rbdplugin-zt4hk 3/3 Running 0 20m
noobaa-core-0 1/1 Running 0 9m32s
noobaa-db-pg-0 1/1 Running 0 9m32s
noobaa-endpoint-9d78b8765-56wp4 1/1 Running 0 8m6s
noobaa-operator-7dd7947864-2slcx 1/1 Running 0 22m
ocs-metrics-exporter-6c7c475cb7-q9xnf 1/1 Running 0 22m
ocs-operator-5997857669-clwk8 1/1 Running 0 22m
odf-console-86f754777d-qg78p 1/1 Running 0 43s
odf-operator-controller-manager-8998f8c96-l6nrk 2/2 Running 0 43s
rook-ceph-crashcollector-04eaaebff1e6c46ea57254eec81feaec-nqmbs 1/1 Running 0 10m
rook-ceph-crashcollector-666fac79ec87e8d94ce46ce74ba6005f-ncpcf 1/1 Running 0 10m
rook-ceph-crashcollector-d5c6ccdcec1a0c0065186ee8bb5cd245-jdssk 1/1 Running 0 9m33s
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-5db988fd5nltj 2/2 Running 0 9m12s
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-7f6756bbw97t4 2/2 Running 0 9m11s
rook-ceph-mgr-a-5ddd548656-zwvlq 2/2 Running 0 10m
rook-ceph-mon-a-f464b9b86-djl2r 2/2 Running 0 19m
rook-ceph-mon-b-7d6fb4468d-5xmmg 2/2 Running 0 15m
rook-ceph-mon-c-d74fbf9bb-swgmw 2/2 Running 0 12m
rook-ceph-operator-5b4fccd558-848gh 1/1 Running 0 22m
rook-ceph-osd-0-66978fd5bd-st8sx 2/2 Running 0 9m51s
rook-ceph-osd-1-68d96b456-9wmm7 2/2 Running 0 9m39s
rook-ceph-osd-2-858d786567-dm2km 2/2 Running 0 9m33s
rook-ceph-osd-prepare-ocs-deviceset-gp2-0-data-0d66wf--1-5hp9t 0/1 Completed 0 10m
rook-ceph-osd-prepare-ocs-deviceset-gp2-1-data-0k9fg5--1-l66dm 0/1 Completed 0 10m
rook-ceph-osd-prepare-ocs-deviceset-gp2-2-data-0nzfvs--1-mjd7n 0/1 Completed 0 10m
pbalogh@pbalogh-mac bug-storageCluster $ oc get csv -n openshift-storage
NAME DISPLAY VERSION REPLACES PHASE
noobaa-operator.v4.9.0 NooBaa Operator 4.9.0 Succeeded
ocs-operator.v4.9.0 OpenShift Container Storage 4.9.0 ocs-operator.v4.8.2 Succeeded
odf-operator.v4.9.0 OpenShift Data Foundation 4.9.0 Succeeded
# You can see storage system was automatically created:
pbalogh@pbalogh-mac bug-storageCluster $ oc get storageSystem -n openshift-storage
NAME STORAGE-SYSTEM-KIND STORAGE-SYSTEM-NAME
ocs-storagecluster-storagesystem storagecluster.ocs.openshift.io/v1 ocs-storagecluster
pbalogh@pbalogh-mac bug-storageCluster $ oc get pod -n openshift-storage
NAME READY STATUS RESTARTS AGE
csi-cephfsplugin-6t8pg 3/3 Running 0 2m13s
csi-cephfsplugin-d52cr 3/3 Running 0 96s
csi-cephfsplugin-provisioner-5c576d45fd-g7xvp 6/6 Running 0 2m11s
csi-cephfsplugin-provisioner-5c576d45fd-pdwh9 6/6 Running 0 2m11s
csi-cephfsplugin-qxd7t 3/3 Running 0 2m6s
csi-rbdplugin-44mfq 3/3 Running 0 99s
csi-rbdplugin-gvcfh 3/3 Running 0 118s
csi-rbdplugin-provisioner-65cffcfcc6-kckt6 6/6 Running 0 2m14s
csi-rbdplugin-provisioner-65cffcfcc6-vh7s7 6/6 Running 0 2m14s
csi-rbdplugin-zjtgz 3/3 Running 0 2m17s
noobaa-core-0 1/1 Running 0 55s
noobaa-db-pg-0 1/1 Running 0 84s
noobaa-endpoint-6785755654-nlhq7 1/1 Terminating 0 2m9s
noobaa-endpoint-7dff67f58-dmwfl 1/1 Running 0 28s
noobaa-operator-67cb9f49d5-96sjs 1/1 Running 0 2m19s
ocs-metrics-exporter-967cf6678-5dsch 1/1 Running 0 2m35s
ocs-operator-5f8f466f96-lq2j9 1/1 Running 0 2m34s
odf-console-86f754777d-qg78p 1/1 Running 0 3m26s
odf-operator-controller-manager-8998f8c96-l6nrk 2/2 Running 0 3m26s
rook-ceph-crashcollector-04eaaebff1e6c46ea57254eec81feaec-fgfv2 1/1 Running 0 2m28s
rook-ceph-crashcollector-666fac79ec87e8d94ce46ce74ba6005f-kc5kj 1/1 Running 0 2m28s
rook-ceph-crashcollector-d5c6ccdcec1a0c0065186ee8bb5cd245-rsx89 1/1 Running 0 2m28s
rook-ceph-detect-version--1-g5qvn 0/1 Init:0/1 0 3s
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-d5949b56s5sz5 2/2 Running 0 44s
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-6c9d847b85qkx 2/2 Running 0 34s
rook-ceph-mgr-a-5ddd548656-zwvlq 2/2 Running 0 13m
rook-ceph-mon-a-69756fd649-cjp75 2/2 Running 0 2m5s
rook-ceph-mon-b-75b64c9b7-kj555 2/2 Running 0 97s
rook-ceph-mon-c-849fc857df-9mgrk 2/2 Running 0 27s
rook-ceph-operator-5688f5b8d-j5tfp 1/1 Running 0 2m35s
rook-ceph-osd-0-66978fd5bd-st8sx 2/2 Running 0 12m
rook-ceph-osd-1-68d96b456-9wmm7 2/2 Running 0 12m
rook-ceph-osd-2-858d786567-dm2km 2/2 Running 0 12m
rook-ceph-osd-prepare-ocs-deviceset-gp2-0-data-0d66wf--1-5hp9t 0/1 Completed 0 12m
rook-ceph-osd-prepare-ocs-deviceset-gp2-1-data-0k9fg5--1-l66dm 0/1 Completed 0 12m
rook-ceph-osd-prepare-ocs-deviceset-gp2-2-data-0nzfvs--1-mjd7n 0/1 Completed 0 12m
I also did record whole upgrade, so once I will get link to recording I can share if you want to see how it went.
Marking as verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Data Foundation 4.9.0 enhancement, security, and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:5086 |