Description of problem (please be detailed as possible and provide log snippets): An external mode OCS cluster is dependent on the resources that are made available from the RHCS cluster, like cephblock pool, cephfs pools and RGW endpoint. The storageclasses in OCS are created based on the json output which contains these details. In case of a change in any of these resources, caused maybe due to some disruptions on the RHCS side,the existing resource in OCS will be affected by it. Some of these failures can be: - Addition/Deletion of cephfs pools - Addition/Deletion of cephblock pool being used by OCS - Addition/Deletion of RGW endpoints The request here is to re-configure the OCS resources like the storageclasses to be able to use the updated parameters from the RHCS cluster, without having to re-deploy OCS. Version of all relevant components (if applicable): OCS version: 4.5.0-526.ci OCP version: 4.5.0-0.nightly-2020-08-20-051434 Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 4 Additional info: As a disruptive test case, the cephblock pool was deleted and re-created with the same name. Although old PVCs were not accessible, we were able to provision new PVCs
OCS does not control the external cluster. If the admin of the external cluster does destructive actions like deleting a pool, there is nothing OCS can do to recover the PVCs using that pool. As you noticed, new volumes can at least be created. Moving the RFE out to 4.7 for discussion. But this is a very broad request. We will really need to evaluate individual scenarios to see if they can even be supported. In some scenarios, the admin could delete the OCS initialization CR and the issue could be resolved by re-creating the storage class. But in other cases there may be nothing we can do. We need specific scenarios. In some scenarios, the Rook CRs can be updated. In other cases, they may be OCS CRs to be updated. Either way, in OCS this would be driven by the OCS operator. Moving to the OCS component in case Jose sees a more general solution, but I would recommend this general BZ be closed and instead track more specific scenarios individually.
Maybe I'm missing the point of this RFE but if there are changes in the cluster resources that are not reflected on OCS side, isn't it the most important thing in the external mode?
(In reply to Travis Nielsen from comment #3) > OCS does not control the external cluster. If the admin of the external > cluster does destructive actions like deleting a pool, there is nothing OCS > can do to recover the PVCs using that pool. As you noticed, new volumes can > at least be created. > New volumes can be created only in the case the re-created pool has the same old name. Else, we cannot create new PVCs too. So, we should have a way to reconfigure/re-initialize SC with the new pool name(in case pool name is diff than original), to enable new PVC creation. As regards old volumes, we agree that we cannot expect to use them if underlying pool is deleted. But at-least for noobaa-db PV, there should be a way to recover noobaa without OCS re-install. Same issue will be seen for internal mode too, in case the noobaa-db PV has some issues. We should have a way to recover Noobaa (even if old data is lost, new things should work - that can only happen if we have a way to recover noobaa in case its DB is lost) > Moving the RFE out to 4.7 for discussion. But this is a very broad request. > We will really need to evaluate individual scenarios to see if they can even > be supported. > > In some scenarios, the admin could delete the OCS initialization CR and the > issue could be resolved by re-creating the storage class. But in other cases > there may be nothing we can do. We need specific scenarios. > > In some scenarios, the Rook CRs can be updated. In other cases, they may be > OCS CRs to be updated. Either way, in OCS this would be driven by the OCS > operator. > > Moving to the OCS component in case Jose sees a more general solution, but I > would recommend this general BZ be closed and instead track more specific > scenarios individually.
Don't think this will ever be prioritized.