Bug 1872755 - [RFE][External mode] Re-sync OCS and RHCS for any changes on the RHCS cluster that will affect the OCS cluster
Summary: [RFE][External mode] Re-sync OCS and RHCS for any changes on the RHCS cluster...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Mudit Agarwal
QA Contact: Elad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-26 14:50 UTC by Rachael
Modified: 2023-08-09 17:00 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-01 13:24:13 UTC
Embargoed:


Attachments (Terms of Use)

Description Rachael 2020-08-26 14:50:03 UTC
Description of problem (please be detailed as possible and provide log
snippets):

An external mode OCS cluster is dependent on the resources that are made available from the RHCS cluster, like cephblock pool, cephfs pools and RGW endpoint. The storageclasses in OCS are created based on the json output which contains these details. In case of a change in any of these resources, caused maybe due to some disruptions on the RHCS side,the existing resource in OCS will be affected by it. Some of these failures can be:

  - Addition/Deletion of cephfs pools
  - Addition/Deletion of cephblock pool being used by OCS
  - Addition/Deletion of RGW endpoints

The request here is to re-configure the OCS resources like the storageclasses to be able to use the updated parameters from the RHCS cluster, without having to re-deploy OCS.  


Version of all relevant components (if applicable):
OCS version: 4.5.0-526.ci
OCP version: 4.5.0-0.nightly-2020-08-20-051434

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
4

Additional info:
As a disruptive test case, the cephblock pool was deleted and re-created with the same name. Although old PVCs were not accessible, we were able to provision new PVCs

Comment 3 Travis Nielsen 2020-08-26 17:33:26 UTC
OCS does not control the external cluster. If the admin of the external cluster does destructive actions like deleting a pool, there is nothing OCS can do to recover the PVCs using that pool. As you noticed, new volumes can at least be created. 

Moving the RFE out to 4.7 for discussion. But this is a very broad request. We will really need to evaluate individual scenarios to see if they can even be supported.

In some scenarios, the admin could delete the OCS initialization CR and the issue could be resolved by re-creating the storage class. But in other cases there may be nothing we can do. We need specific scenarios. 

In some scenarios, the Rook CRs can be updated. In other cases, they may be OCS CRs to be updated. Either way, in OCS this would be driven by the OCS operator. 

Moving to the OCS component in case Jose sees a more general solution, but I would recommend this general BZ be closed and instead track more specific scenarios individually.

Comment 4 Raz Tamir 2020-08-26 18:23:18 UTC
Maybe I'm missing the point of this RFE but if there are changes in the cluster resources that are not reflected on OCS side, isn't it the most important thing in the external mode?

Comment 5 Neha Berry 2020-08-26 18:44:02 UTC
(In reply to Travis Nielsen from comment #3)
> OCS does not control the external cluster. If the admin of the external
> cluster does destructive actions like deleting a pool, there is nothing OCS
> can do to recover the PVCs using that pool. As you noticed, new volumes can
> at least be created. 
> 
New volumes can be created only in the case the re-created pool has the same old name. Else, we cannot create new PVCs too. So, we should have a way to reconfigure/re-initialize SC with the new pool name(in case pool name is diff than original), to enable new PVC creation.

As regards old volumes, we agree that we cannot expect to use them if underlying pool is deleted. But at-least for noobaa-db PV, there should be a way to recover noobaa without OCS re-install. 

Same issue will be seen for internal mode too, in case the noobaa-db PV has some issues. We should have a way to recover Noobaa (even if old data is lost, new things should work - that can only happen if we have a way to recover noobaa in case its DB is lost)


> Moving the RFE out to 4.7 for discussion. But this is a very broad request.
> We will really need to evaluate individual scenarios to see if they can even
> be supported.
> 
> In some scenarios, the admin could delete the OCS initialization CR and the
> issue could be resolved by re-creating the storage class. But in other cases
> there may be nothing we can do. We need specific scenarios. 
> 
> In some scenarios, the Rook CRs can be updated. In other cases, they may be
> OCS CRs to be updated. Either way, in OCS this would be driven by the OCS
> operator. 
> 
> Moving to the OCS component in case Jose sees a more general solution, but I
> would recommend this general BZ be closed and instead track more specific
> scenarios individually.

Comment 24 Mudit Agarwal 2023-08-01 13:24:13 UTC
Don't think this will ever be prioritized.


Note You need to log in before you can comment on or make changes to this bug.