Bugzilla (bugzilla.redhat.com) will be under maintenance for infrastructure upgrades and will not be available on July 31st between 12:30 AM - 05:30 AM UTC. We appreciate your understanding and patience. You can follow status.redhat.com for details.
Bug 1879919 - [External] Upgrade mechanism from OCS 4.5 to OCS 4.6 needs to be fixed
Summary: [External] Upgrade mechanism from OCS 4.5 to OCS 4.6 needs to be fixed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat
Component: rook
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: OCS 4.6.0
Assignee: Sébastien Han
QA Contact: Rachael
URL:
Whiteboard:
Depends On:
Blocks: 1881071
TreeView+ depends on / blocked
 
Reported: 2020-09-17 11:03 UTC by Rachael
Modified: 2020-12-17 06:24 UTC (History)
8 users (show)

Fixed In Version: 4.6.0-110.ci
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1881071 (view as bug list)
Environment:
Last Closed: 2020-12-17 06:24:14 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift rook pull 124 0 None closed Bug 1879919: ceph: only run version check for monitoring 2020-11-10 12:47:42 UTC
Github rook rook pull 6276 0 None closed ceph: only run version check for monitoring 2020-11-10 12:47:42 UTC
Github rook rook pull 6353 0 None closed ceph: fix obc upgrade from 1.3 to 1.4 external cluster 2020-11-10 12:47:42 UTC
Red Hat Product Errata RHSA-2020:5605 0 None None None 2020-12-17 06:24:35 UTC

Description Rachael 2020-09-17 11:03:48 UTC
Description of problem (please be detailed as possible and provide log
snippests):

OCS upgrade from 4.5 to 4.6 in external mode is broken. The following error messages were seen in the rook-operator logs after the upgrade:

2020-09-17 10:01:51.840563 E | ceph-cluster-controller: failed to reconcile. failed to reconcile cluster "ocs-external-storagecluster-cephcluster": failed to configure external ceph cluster: failed to configure external cluster monitoring: failed to create or update mgr endpoint: failed to create endpoint "rook-ceph-mgr-external". Endpoints "rook-ceph-mgr-external" is invalid: [subsets[0].addresses[0].ip: Invalid value: "": must be a valid IP address, (e.g. 10.9.8.7), subsets[0].addresses[0].ip: Invalid value: "": must be a valid IP address]

It was also observed that RGW OBCs were stuck in Pending state after the upgrade with the following error:

2020-09-17 09:47:18.134717 I | op-bucket-prov: getting storage class "ocs-external-storagecluster-ceph-rgw"
E0917 09:47:18.136502       8 controller.go:197] error syncing 'failure/rgw-1': error provisioning bucket: failed to get cephObjectStore: error getting cephObjectStore: resource name may not be empty, requeuing

With the addition of monitoring IP and extra permissions from OCS 4.6, these need to be updated on the 4.5 external mode cluster before/during upgrade.

Version of all relevant components (if applicable):
OCP version: 4.6.0-0.nightly-2020-09-17-004654

$ oc get csv
NAME                         DISPLAY                       VERSION        REPLACES                     PHASE
ocs-operator.v4.6.0-564.ci   OpenShift Container Storage   4.6.0-564.ci   ocs-operator.v4.5.0-560.ci   Succeeded

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Not able to perform a successful upgrade to newer OCS version

Is there any workaround available to the best of your knowledge?
Not that I am aware of

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:
Not a regression

Steps to Reproduce:

1. Upgrade OCS 4.5 external mode cluster to OCS 4.6

  - oc edit catsrc/ocs-catalogsource -n openshift-marketplace -> Change image to OCS 4.6 image


Actual results:
The upgrade is not successful. RGW OBC is stuck in pending state and reconcile fails


Expected results:
Upgrade should be successful

Comment 13 Travis Nielsen 2020-09-30 19:40:45 UTC
@Rachael Please share the rook operator log from the latest repro to confirm if the error is exactly the same or if it is different now.

Comment 19 errata-xmlrpc 2020-12-17 06:24:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.6.0 security, bug fix, enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5605


Note You need to log in before you can comment on or make changes to this bug.