Description of problem (please be detailed as possible and provide log
OCS upgrade from 4.5 to 4.6 in external mode is broken. The following error messages were seen in the rook-operator logs after the upgrade:
2020-09-17 10:01:51.840563 E | ceph-cluster-controller: failed to reconcile. failed to reconcile cluster "ocs-external-storagecluster-cephcluster": failed to configure external ceph cluster: failed to configure external cluster monitoring: failed to create or update mgr endpoint: failed to create endpoint "rook-ceph-mgr-external". Endpoints "rook-ceph-mgr-external" is invalid: [subsets.addresses.ip: Invalid value: "": must be a valid IP address, (e.g. 10.9.8.7), subsets.addresses.ip: Invalid value: "": must be a valid IP address]
It was also observed that RGW OBCs were stuck in Pending state after the upgrade with the following error:
2020-09-17 09:47:18.134717 I | op-bucket-prov: getting storage class "ocs-external-storagecluster-ceph-rgw"
E0917 09:47:18.136502 8 controller.go:197] error syncing 'failure/rgw-1': error provisioning bucket: failed to get cephObjectStore: error getting cephObjectStore: resource name may not be empty, requeuing
With the addition of monitoring IP and extra permissions from OCS 4.6, these need to be updated on the 4.5 external mode cluster before/during upgrade.
Version of all relevant components (if applicable):
OCP version: 4.6.0-0.nightly-2020-09-17-004654
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASE
ocs-operator.v4.6.0-564.ci OpenShift Container Storage 4.6.0-564.ci ocs-operator.v4.5.0-560.ci Succeeded
Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Not able to perform a successful upgrade to newer OCS version
Is there any workaround available to the best of your knowledge?
Not that I am aware of
Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
Can this issue reproducible?
Can this issue reproduce from the UI?
If this is a regression, please provide more details to justify this:
Not a regression
Steps to Reproduce:
1. Upgrade OCS 4.5 external mode cluster to OCS 4.6
- oc edit catsrc/ocs-catalogsource -n openshift-marketplace -> Change image to OCS 4.6 image
The upgrade is not successful. RGW OBC is stuck in pending state and reconcile fails
Upgrade should be successful
@Rachael Please share the rook operator log from the latest repro to confirm if the error is exactly the same or if it is different now.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.6.0 security, bug fix, enhancement update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.