Bug 1840084 - The operator should delete the unused PVCs for mons
Summary: The operator should delete the unused PVCs for mons
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: rook
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: OCS 4.5.0
Assignee: Travis Nielsen
QA Contact: Aviad Polak
URL:
Whiteboard:
: 1859960 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-26 10:50 UTC by Petr Balogh
Modified: 2020-10-20 06:33 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-15 10:17:07 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github rook rook pull 5698 0 None closed ceph: During mon failover remove the backing pvc 2021-02-12 10:47:23 UTC
Red Hat Product Errata RHBA-2020:3754 0 None None None 2020-09-15 10:17:34 UTC

Description Petr Balogh 2020-05-26 10:50:50 UTC
Description of problem (please be detailed as possible and provide log
snippests):
In my deployment I see this:
pbalogh@MacBook-Pro ocs45 $ oc get pvc -n openshift-storage
NAME                           STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
db-noobaa-db-0                 Bound    pvc-6fbacf26-a00d-462d-94d1-8d99fa6516d8   50Gi       RWO            ocs-storagecluster-ceph-rbd   15h
ocs-deviceset-0-data-0-q8k4g   Bound    pvc-d8c2eb68-0419-4f02-88a1-d9516eeacb8f   512Gi      RWO            thin                          15h
ocs-deviceset-1-data-0-hswzn   Bound    pvc-3c26fdbe-2914-4a41-8257-2a54a4f406cd   512Gi      RWO            thin                          15h
ocs-deviceset-2-data-0-s6fhd   Bound    pvc-40a28840-3816-4bf1-a584-1734a58ed64f   512Gi      RWO            thin                          15h
rook-ceph-mon-a                Bound    pvc-94fffde3-3056-4693-9852-537a010c3b48   10Gi       RWO            thin                          15h
rook-ceph-mon-b                Bound    pvc-800b8e57-39a7-4f95-9421-977419599da0   10Gi       RWO            thin                          15h
rook-ceph-mon-c                Bound    pvc-61dcaaa7-7ad0-4fc5-9f66-8e58cd082f52   10Gi       RWO            thin                          15h
rook-ceph-mon-d                Bound    pvc-35740e57-bbe1-4462-9c60-ed1becc87fe2   10Gi       RWO            thin                          15h
rook-ceph-mon-e                Bound    pvc-ca6d7e44-fbb2-4799-a3f9-fbd2e7175e32   10Gi       RWO            thin                          15h
rook-ceph-mon-f                Bound    pvc-3860c571-2d85-4d2a-9018-2565fdb37d13   10Gi       RWO            thin                          15h


Pods:

$ oc get pod -n openshift-storage
NAME                                                              READY   STATUS      RESTARTS   AGE
csi-cephfsplugin-b5622                                            3/3     Running     0          15h
csi-cephfsplugin-provisioner-67655ccbc4-4jlg7                     5/5     Running     0          15h
csi-cephfsplugin-provisioner-67655ccbc4-xm656                     5/5     Running     0          15h
csi-cephfsplugin-rp2xl                                            3/3     Running     0          15h
csi-cephfsplugin-w8xx6                                            3/3     Running     0          15h
csi-rbdplugin-dzq52                                               3/3     Running     0          15h
csi-rbdplugin-jr947                                               3/3     Running     0          15h
csi-rbdplugin-provisioner-75965d8977-jlv7b                        5/5     Running     0          15h
csi-rbdplugin-provisioner-75965d8977-jvh86                        5/5     Running     0          15h
csi-rbdplugin-st495                                               3/3     Running     0          15h
lib-bucket-provisioner-5b9cb4f848-rrstf                           1/1     Running     0          15h
noobaa-core-0                                                     1/1     Running     0          15h
noobaa-db-0                                                       1/1     Running     0          15h
noobaa-endpoint-54698c585-5n97s                                   1/1     Running     0          15h
noobaa-operator-57df96c6c5-bnk5x                                  1/1     Running     0          15h
ocs-operator-549c4598ff-cv2gm                                     1/1     Running     0          15h
rook-ceph-crashcollector-rhel1-0.jnk-vu1rs33-t1.qe.rh-ocs.kdcwx   1/1     Running     0          15h
rook-ceph-crashcollector-rhel1-1.jnk-vu1rs33-t1.qe.rh-ocs.q26cd   1/1     Running     0          15h
rook-ceph-crashcollector-rhel1-2.jnk-vu1rs33-t1.qe.rh-ocs.97jj2   1/1     Running     0          15h
rook-ceph-drain-canary-rhel1-0.jnk-vu1rs33-t1.qe.rh-ocs.cog5jnh   1/1     Running     0          15h
rook-ceph-drain-canary-rhel1-1.jnk-vu1rs33-t1.qe.rh-ocs.co4qlqn   1/1     Running     0          15h
rook-ceph-drain-canary-rhel1-2.jnk-vu1rs33-t1.qe.rh-ocs.co26mjp   1/1     Running     0          15h
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-747f6887m9zg8   1/1     Running     0          15h
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-b9d87b86x7kfx   1/1     Running     0          15h
rook-ceph-mgr-a-86cfcfdd58-92g9c                                  1/1     Running     0          15h
rook-ceph-mon-a-56f67f5968-6j4px                                  1/1     Running     0          15h
rook-ceph-mon-d-849d8759b4-x2xw2                                  1/1     Running     0          15h
rook-ceph-mon-f-678c96bbf6-grh6p                                  1/1     Running     0          15h
rook-ceph-operator-54fd57594-9gqnm                                1/1     Running     0          15h
rook-ceph-osd-0-788dccc6ff-rlc9t                                  1/1     Running     0          15h
rook-ceph-osd-1-6cb468b895-lxmh8                                  1/1     Running     0          15h
rook-ceph-osd-2-76f8f48f68-rh6f8                                  1/1     Running     0          15h
rook-ceph-osd-prepare-ocs-deviceset-0-data-0-q8k4g-kj9bb          0/1     Completed   0          15h
rook-ceph-osd-prepare-ocs-deviceset-1-data-0-hswzn-hqx96          0/1     Completed   0          15h
rook-ceph-osd-prepare-ocs-deviceset-2-data-0-s6fhd-4vct7          0/1     Completed   0          15h
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-65f674cdj967   1/1     Running     0          15h


So there are 6 PVCs fro mons but actually we have only 3.

Version of all relevant components (if applicable):
OCP 4.4


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
No

Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3


Can this issue reproducible?
Not sure, we saw it in the past I think.

Can this issue reproduce from the UI?
Not sure


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Install ocs over VmWare
2. Not sure it's always reproducible



Actual results:
6 PVCs but only 3 mons running.


Expected results:
There should be just specific number of PVCs for existing pods IMO

Additional info:
Jenkins job:
https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/8017/console
Logs:
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-vu1rs33-t1/jnk-vu1rs33-t1_20200525T161512/logs/failed_testcase_ocs_logs_1590423591/deployment_ocs_logs/

Comment 2 Yaniv Kaul 2020-06-01 14:35:37 UTC
Do you see it in other setups? Post deployment? After some tests? After negative tests?

Comment 3 Petr Balogh 2020-06-04 09:54:09 UTC
As this is not critical and basically do not break anything we do not have any test for this AFAIK.

I noticed it in some of our deployment and remembered this was not first time hence opened the BZ. But as we don't do this check for: len(mon pods) = len(mon PVCs) now we will never have an issue with this if it will not be decided to have that check in place and then we will fail the deployment if we will hit such issue again.

I think that Neha know more details about this, can you please answer more info if you've seen this in other cases as well?


When I opened this BZ I thought we will get some input from Travis maybe if this can be handled on operator level?

Comment 5 Michael Adam 2020-06-15 14:42:55 UTC
Moving to rook, as creation of mon-PVCs is rook's responsibility.

Comment 6 Michael Adam 2020-06-15 14:46:37 UTC
(In reply to Petr Balogh from comment #3)
> As this is not critical and basically do not break anything

As this is not critical, why is it proposed for 4.4.z?

Moving to 4.5 for now.

@Travis can you confirm what is the intended behavior in rook?

Comment 7 Travis Nielsen 2020-06-15 16:33:11 UTC
Rook should be deleting the PVC that is no longer needed for the mon that was failed over. This is not currently implemented, definitely need to fix it.

Comment 8 Travis Nielsen 2020-06-24 21:48:22 UTC
Upstream fix: https://github.com/rook/rook/pull/5698

Comment 16 Travis Nielsen 2020-07-17 18:41:28 UTC
@Neha The code path for starting mons the first time when there is a long delay and it creates mons a,d,e is much different from the case where quorum is first established with a,b,c and then failover occurs later. This has long been a bug in Rook where b,c are skipped if the first reconcile takes too long. 

How about if we consider this issue verified for 4.5 and we open a new BZ for the remaining work? Fixing that scenario will require more than just deleting the unused PVCs. Since it is a bigger change I'd suggest opening it for 4.6.

Comment 17 Petr Balogh 2020-07-20 15:02:22 UTC
From my point of view I am OK to move it to 4.6 as this is nothing urgent I think. But would like to hear also others opinion like was asked from Neha.

Comment 19 Travis Nielsen 2020-07-21 18:59:19 UTC
@Neha Anytime the operator finds that a mon is unhealthy for more than the timeout (10 min), it will attempt to start a new mon and complete the failover.
1. Deleting a mon deployment: The operator might failover to a new mon, or it might first try to re-create the same mon deployment that is missing. Starting in 4.6 you should see the same mon deployment re-created again, but in 4.5 you likely will see the failover scenario.
2. Node replacement: The mon will be force deleted by the operator if the node is not responding, so the mon likely will already move to another node before the failover will be triggered after 10 minutes. Again, it just depends on timing.
3. The most reliable way to test the failover scenario is likely to set the replicas=0 on the mon deployment. As long as the operator doesn't restart, it won't notice that the deployment doesn't match the desired state of replica=1.

Comment 20 Travis Nielsen 2020-07-27 22:06:36 UTC
*** Bug 1859960 has been marked as a duplicate of this bug. ***

Comment 21 Aviad Polak 2020-08-19 11:19:49 UTC
with builds:
ocs: ocs-operator.v4.5.0-54.ci 
ocp: 4.5.0-0.nightly-2020-08-15-052753

on vmware, tested with delete deployment, delete pod and set replica=0, looks good
moving to verifed

Comment 23 errata-xmlrpc 2020-09-15 10:17:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Container Storage 4.5.0 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3754


Note You need to log in before you can comment on or make changes to this bug.