Bug 2107206 - ODF upgrade from 4.9 to 4.10 with RHCS 5.0.4 external will result in inability to create new RWX PVs [NEEDINFO]
Summary: ODF upgrade from 4.9 to 4.10 with RHCS 5.0.4 external will result in inabilit...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: ---
Assignee: Jose A. Rivera
QA Contact: Martin Bukatovic
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-14 13:56 UTC by gsternag@redhat.com
Modified: 2023-08-09 17:00 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-10-12 11:10:18 UTC
Embargoed:
bkunal: needinfo? (gsternag)


Attachments (Terms of Use)

Description gsternag@redhat.com 2022-07-14 13:56:34 UTC
Description of problem (please be detailed as possible and provide log
snippests):
ODF operator update from version 4.9 to 4.10 with external RHCS does not check external Ceph version and then fails to create new RWX PVs

Version of all relevant components (if applicable):
ODF 4.10
RHCS 5.0.4

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes

Is there any workaround available to the best of your knowledge?
Maybe, upgrade RHCS to latest (5.1.2) but that upgrade fails (BZ #2107203)

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Install or use RHCS 5.0.4 cluster for use with ODF.
2. Upgrade ODF Operator from 4.9 to 4.10
3. Try to create a RWX (CephFS-based) PV


Actual results:
Fails to create PV

Expected results:
PVC stuck in pending - storageclass ocs-external

Additional info:
This also happened exactly the same way between ODF 4.7 -> 4.8 upgrade while having RHCS 4.1 installed. Upgrade to 4.1z4 solved the issue.

Comment 2 gsternag@redhat.com 2022-07-15 11:10:02 UTC
Updating RHCS from 5.0.4 to 5.1.2 fixed the problem. What remains is the following expectation:
a) we document this properly in the ODF releases notes or the ODF admin guide
b) For an update of an existing ODF external deployment, the ODF Operator must check which Ceph release is installed before it actually installs. If it is an unsupported Ceph release, then it should display that and refuse not install the new ODF Operator. By not doing so, a user will end up with an unusable OpenShift storage environment which is definitely not desirable. 
c) we remove this strong binding between ODF versions and Ceph versions. No customer will want to upgrade their Ceph cluster everytime an ODF update is available. It's just not feasible from an operational PoV unless that Ceph cluster is solely used for OpenShift workloads. You wouldn't want having to upgrade your AWS EBS versions or NetApp firmware either.

Comment 3 Bipin Kunal 2022-07-15 16:19:04 UTC
Hi Gerald,

   Good to know that you don't see issue with 5.1.2.

   I confirmed with QE team that we have automated upgrade tests in the external mode and we do run all tier 1 tests post upgrade. As a part of the tier 1 test, we do create RWX PVs and we did not observe any issue. 

  Today I tested it as well with RHCS-5.0.4 and then upgraded ODF-4.9.9 to ODF-4.10.4 and did not observe any issue either. May be I was lucky enough. 

  If you are able to consistently able to reproduce the issue, I would appreciate you provide us more details on the sequence of steps being executed, errors you see while RWX PV creation along with odf must-gather.

-Bipin Kunal

Comment 4 Nitin Goyal 2022-10-12 11:10:18 UTC
I am closing this bug as there is no traffic on it for a few months, Please reopen it if you feel otherwise.


Note You need to log in before you can comment on or make changes to this bug.