1859229 – Rook should delete extra MON PVCs in case first reconcile takes too long and rook skips "b" and "c" (spawned from Bug 1840084#c14)

Bug 1859229 - Rook should delete extra MON PVCs in case first reconcile takes too long and rook skips "b" and "c" (spawned from Bug 1840084#c14)

Summary: Rook should delete extra MON PVCs in case first reconcile takes too long and ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Container Storage
Classification:	Red Hat Storage
Component:	rook
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	OCS 4.6.0
Assignee:	Travis Nielsen
QA Contact:	Neha Berry
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-07-21 13:53 UTC by Neha Berry
Modified:	2020-12-17 06:23 UTC (History)
CC List:	4 users (show)
Fixed In Version:	4.6.0-98.ci
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-12-17 06:23:00 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift rook pull 125	0	None	closed	Bug 1873118: Backport mon pv cleanup	2020-10-27 14:03:14 UTC
Red Hat Product Errata	RHSA-2020:5605	0	None	None	None	2020-12-17 06:23:39 UTC

Description Neha Berry 2020-07-21 13:53:54 UTC

Description of problem (please be detailed as possible and provide log
snippests):
--------------------------------------------------------------
This BZ is spawned based on the comment here - Bug 1840084#c16.

With fix for Bug 1840084, rook operator deletes the unused MON PVCs in scenarios where " quorum is first established with a,b,c and then failover occurs later".

But b,c are skipped if the first reconcile takes too long and those PVCs are not yet deleted by rook operator. This BZ is raised to track this particular use-case:

More details (See Bug 1840084#c0 and Bug 1840084#c14 for more details)
-------------------

In one of my normal installation on VMware(with a little slower hardware, hence clock skews are also detected), it is seen the mon-a, mon-d and mon-e are ultimately added to quorum instead of mon-a,b,c. But mon-b and mon-c PVCs are  are still left unused and not deleted by the operator.


Logs copied here - http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-1840084-after-deployment/

Zipped folder = http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-1840084-after-deployment.zip

P.S: The steps to reproduce match exactly with Comment#0.
1. Deploy OCS on vsphere
2. Check the pods and PVCs


>> PODS
rook-ceph-mon-a-5dc69f5f85-6mdcm                                  1/1     Running     0          78m   10.128.2.19   compute-0   <none>           <none>
rook-ceph-mon-d-85dbb6c849-b6zmg                                  1/1     Running     0          75m   10.129.2.20   compute-1   <none>           <none>
rook-ceph-mon-e-766fc9bd4f-499dt                                  1/1     Running     0          75m   10.131.0.16   compute-2   <none>           <none>


>> PVC

openshift-storage          rook-ceph-mon-a                             Bound    pvc-9272fac1-1b03-4285-9d25-d3eea4bdff81   10Gi       RWO            thin                          80m
openshift-storage          rook-ceph-mon-b                             Bound    pvc-49a560d0-bd94-4b65-a989-1317f8a0b668   10Gi       RWO            thin                          79m
openshift-storage          rook-ceph-mon-c                             Bound    pvc-c4248d3c-7724-4362-ada0-18405a1274b9   10Gi       RWO            thin                          79m
openshift-storage          rook-ceph-mon-d                             Bound    pvc-52de08b7-e6e2-4439-a5ea-ef37b041fc49   10Gi       RWO            thin                          77m
openshift-storage          rook-ceph-mon-e                             Bound    pvc-595a988d-06fc-41c8-9c69-519816b85069   10Gi       RWO            thin                          76m

__________________



Version of all relevant components (if applicable):
--------------------------------------------------------
 
OCS= 4.5.0-493.ci
OCP = 4.5.0-0.nightly-2020-07-17-032241
Ceph = RHCS 4.1 - 14.2.8-59.el8cp



Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
--------------------------------------------
No

Is there any workaround available to the best of your knowledge?
--------------------------------------
Manually delete the unused PVCs

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
-----------------------------------------------
3 and intermittently seen. 

Can this issue reproducible?
------------------------------
Intermittently.


Can this issue reproduce from the UI?
--------------------------------
Yes

If this is a regression, please provide more details to justify this:
--------------------------------------------
No. Behavior exists since OCS 4.2

Steps to Reproduce:
-------------------
1. Install ocs over VmWare
2. check the OCS pods, MON and OSD PVCs. 


Actual results:
---------------------
In case first reconcile takes time, sometimes, we get mon-a, mon-d and mon-e instead of the usual mon-a,b,c. But the unused PVCs of mon-b,c are still left in the cluster.


Expected results:
-----------------------

Rook should delete the unused PVCs from first reconcile too (post fresh deployment)

Additional info:

Comment 3 Travis Nielsen 2020-07-21 18:52:06 UTC

Moving to 4.6 per comments above

Comment 6 Travis Nielsen 2020-09-21 18:55:50 UTC

I believe the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1873118 will cover this scenario as well.

Comment 7 Neha Berry 2020-09-24 13:42:32 UTC

Hi Travis,

Since initial slow deployment resulting in mon-a, d and e is not easily reproducible(depends on State of hardware at times), can you help with the steps to verify this BZ ?

Comment 8 Travis Nielsen 2020-09-24 18:46:53 UTC

I don't know an easy way to repro this specific scenario either. But the linked fix will delete any orphaned mon PVC that has the label "app=rook-ceph-mon" and does not have a corresponding mon deployment. The orphaned mons will only be removed after there is a failover of some mon. 

To verify the general fix, you could manually create a PVC that matches the label, then stop a mon, wait 10 minutes for the mon failover, then verify that the orphaned PVCs are removed after the mon failover.

Comment 11 Neha Berry 2020-10-01 10:52:55 UTC

@travis

Also, if you confirm that reusing a manually created PVC (with NOT all appropriate labels) is not an issue, then IIUC, the fix is working fine and this BZ can be moved to verified state

Please Note; I also created an PVC with "rook-ceph-mon-xx" name but with no label (app=rook-ceph-mon), and as expected, the PVC was not cleaned up during re-conciliation.

Comment 12 Travis Nielsen 2020-10-01 16:47:40 UTC

@Neha The validation steps are very thorough, thanks for all the details, including that we didn't remove any PVCs without the label. I do feel confident that we can move the BZ to verified state.

I'm not worried about using the manually created PVC without all the labels. The labels are mostly a convenience for the users to query them if needed, we just query certain labels like the "app=rook-ceph-mon". But we should consider it an unsupported scenario if resources like PVCs are created manually. We do assume that resources exist with certain names. If they already exist, we use them and usually update them, although PVCs aren't updated. For example, if we found an existing deployment, we would update it with all desired pod spec if anything is different than expected.

Comment 13 Neha Berry 2020-10-05 14:02:04 UTC

(In reply to Travis Nielsen from comment #12)
> @Neha The validation steps are very thorough, thanks for all the details,
> including that we didn't remove any PVCs without the label. I do feel
> confident that we can move the BZ to verified state.
> 
thanks Travis. Will move this BZ to verified state.

> I'm not worried about using the manually created PVC without all the labels.
> The labels are mostly a convenience for the users to query them if needed,
> we just query certain labels like the "app=rook-ceph-mon". But we should
> consider it an unsupported scenario if resources like PVCs are created
> manually.

So how shall we handle it and not allow users to use a manually created PVC ? How do we mark it as unsupported? via docs ?

> We do assume that resources exist with certain names. If they
> already exist, we use them and usually update them, although PVCs aren't
> updated. For example, if we found an existing deployment, we would update it
> with all desired pod spec if anything is different than expected.

I forgot to test one thing - in case I created a manual PVC - rook-ceph-mon-d but did not add the label (app=rook-ceph-mon) , would rook have used this PVC or created a new one with appropriate label (app=rook-ceph-mon, + all other labels) ?

If yes to above query, then wouldn't it be better , to say if manual PVC with mon-d is already present, create a new PVC, e.g mon-e and create rook-ceph-mon-e deployment using the new rook PVC ?

In that way, the maunal PVC would stay unused. Just a query.

Comment 14 Travis Nielsen 2020-10-05 16:34:05 UTC

(In reply to Neha Berry from comment #13)
> > I'm not worried about using the manually created PVC without all the labels.
> > The labels are mostly a convenience for the users to query them if needed,
> > we just query certain labels like the "app=rook-ceph-mon". But we should
> > consider it an unsupported scenario if resources like PVCs are created
> > manually.
> 
> So how shall we handle it and not allow users to use a manually created PVC
> ? How do we mark it as unsupported? via docs ?

This is one of those corner cases where there are a lot of things that the admin could do unsupported things if they chose to pre-create resources, delete resources, or otherwise do malicious things to Rook. If somebody is intentionally doing rook resource manipulation, we just aren't going to support it, and neither do I think we need to document that fact. I hope it's common sense.

> > We do assume that resources exist with certain names. If they
> > already exist, we use them and usually update them, although PVCs aren't
> > updated. For example, if we found an existing deployment, we would update it
> > with all desired pod spec if anything is different than expected.
> 
> I forgot to test one thing - in case I created a manual PVC -
> rook-ceph-mon-d but did not add the label (app=rook-ceph-mon) , would rook
> have used this PVC or created a new one with appropriate label
> (app=rook-ceph-mon, + all other labels) ?
> 
> If yes to above query, then wouldn't it be better , to say if manual PVC
> with mon-d is already present, create a new PVC, e.g mon-e and create
> rook-ceph-mon-e deployment using the new rook PVC ?
> 
> In that way, the maunal PVC would stay unused. Just a query.

Rook would still use the PVC even if it didn't have that label, but I believe it would just not be deleted by the orphan PVC reaper that was just added for this BZ. 

Again, I would say we don't need to worry about this scenario since it's such a corner case. Nobody has a reason to create PVCs that Rook expects.

Comment 15 Neha Berry 2020-10-06 14:51:58 UTC

Ack. Thanks @travis for the confirmation.

Based on Comment #8 through Comment#14, moving the BZ to verified state.

Comment 17 errata-xmlrpc 2020-12-17 06:23:00 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.6.0 security, bug fix, enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5605

Note You need to log in before you can comment on or make changes to this bug.