Bug 1889866 - Post node power off/on, an unused MON PVC still stays back in the cluster [NEEDINFO]
Summary: Post node power off/on, an unused MON PVC still stays back in the cluster
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat
Component: rook
Version: 4.5
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: OCS 4.6.0
Assignee: Travis Nielsen
QA Contact: Martin Bukatovic
Depends On:
TreeView+ depends on / blocked
Reported: 2020-10-20 18:17 UTC by Neha Berry
Modified: 2020-11-19 18:23 UTC (History)
7 users (show)

Fixed In Version: 4.6.0-148.ci
Doc Type: No Doc Update
Doc Text:
Clone Of:
Last Closed:
Target Upstream Version:
mbukatov: needinfo? (nberry)

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Github openshift rook pull 139 None closed Bug 1889866: Check for orphaned mon resources with every reconcile 2020-11-27 13:40:46 UTC
Github rook rook pull 6493 None closed ceph: Check for orphaned mon resources with every reconcile 2020-11-27 13:40:46 UTC

Description Neha Berry 2020-10-20 18:17:59 UTC
Description of problem (please be detailed as possible and provide log
OCS 4.5.1-rc2, OCP 4.6	4.6.0-0.nightly-2020-10-17-040148. mon in quorum - a,b,d 

while trying to test mon failover in case of node shutdown, powered off a node (compute-2) for >10 mins and as expected

a)  Terminating mon-b was removed by operator. 

b) Mon-e tried to get scheduled @Tue Oct 20 16:32:53 UTC 2020(but stayed in Pending state as it is a 3 node W node cluster) . 

Powered back the node after >15-20 mins and 

a) mon-b reconciled and came back up(original mon) 
b)mon-e canary pod was deleted by operator
c) But the mon-e leftover PVC was not deleted by rook-operator and it still exists in the cluster

2020-10-20 16:41:54.445910 I | op-k8sutil: Removed deployment rook-ceph-mon-e-canary
2020-10-20 16:41:54.448495 I | op-k8sutil: "rook-ceph-mon-e-canary" still found. waiting...
2020-10-20 16:41:56.451648 I | op-k8sutil: confirmed rook-ceph-mon-e-canary does not exist

rook-ceph-mon-a                     Bound    pvc-7cb64575-f1be-403b-a1c8-2952aa805c9a   10Gi       RWO            thin                          4h7m
rook-ceph-mon-b                     Bound    pvc-5fd11125-426d-4cb8-8bac-27e770a15d91   10Gi       RWO            thin                          4h7m
rook-ceph-mon-d                     Bound    pvc-53990c83-9583-4cf7-9225-f469ced86ae6   10Gi       RWO            thin                          110m
rook-ceph-mon-e                     Bound    pvc-71e43b77-edf3-4cb4-ac34-4fe86e0cc023   10Gi       RWO            thin                          84m

As part of Bug 1840084, this extra mon PVC should have got deleted. 

Version of all relevant components (if applicable):
OCS = ocs-operator.v4.5.1-130.ci
OCP = 4.6.0-0.nightly-2020-10-17-040148

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

Is there any workaround available to the best of your knowledge?
We can delete  the mon PVC> the deployment was already deleted.

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?

Can this issue reproducible?

Can this issue reproduce from the UI?

If this is a regression, please provide more details to justify this:
It could be an edge case even after fixing Bug 1840084

Steps to Reproduce:
1. Create a 3 node OCS 4.5.1 cluster 
2. Power off one node (prefarably the one which is not hosting the rook-ceph-operator)
3. Wait for >15 mins and check that a new mon-canary pod is also in pending state
4. Power ON the node
5. Check if `original mon gets restored. if yes, confirm if the PVC created for the new incremental mon is deleted by operator or not

Actual results:
The mon PVC of the incremental mon is not deleted and stays behind

Expected results:
Any unused mon PVC should get deleted

Additional info:
1.  The rook-ceph-operator pod was not restarted throuout this scenario.
2. Tested similar behavior on 4.6, but in that scenario, rook operator pod was respinned once the node was back up. There was a leftover PVC, even though the mon which ultimately comes up is the incremental mon

When node was powered off
rook-ceph-mon-a-8d75868b8-whsmp                                   1/1     Running       0          79m    compute-0   <none>           <none>
rook-ceph-mon-b-649df78586-rw9kv                                  0/1     Pending       0          6m22s   <none>         <none>      <none>           <none>
rook-ceph-mon-d-845f84d78-mjhhq                                   1/1     Running       0          26m    compute-1   <none>           <none>
rook-ceph-mon-e-canary-89998fdc-5959m                             0/1     Pending       0          54s     <none>         <none>      <none>           <none>

>> PVC

rook-ceph-mon-a                     Bound    pvc-7cb64575-f1be-403b-a1c8-2952aa805c9a   10Gi       RWO            thin                          163m
rook-ceph-mon-b                     Bound    pvc-5fd11125-426d-4cb8-8bac-27e770a15d91   10Gi       RWO            thin                          163m
rook-ceph-mon-d                     Bound    pvc-53990c83-9583-4cf7-9225-f469ced86ae6   10Gi       RWO            thin                          26m
rook-ceph-mon-e                     Bound    pvc-71e43b77-edf3-4cb4-ac34-4fe86e0cc023   10Gi       RWO            thin                          58s

When node was powered back ON
rook-ceph-mon-a-8d75868b8-whsmp                                   1/1     Running     0          162m    compute-0   <none>           <none>
rook-ceph-mon-b-649df78586-rw9kv                                  1/1     Running     0          89m    compute-2   <none>           <none> >>>>> original mon ultimately recovers
rook-ceph-mon-d-845f84d78-mjhhq                                   1/1     Running     0          109m    compute-1   <none>           <none>


rook-ceph-mon-a                     Bound    pvc-7cb64575-f1be-403b-a1c8-2952aa805c9a   10Gi       RWO            thin                          4h7m
rook-ceph-mon-b                     Bound    pvc-5fd11125-426d-4cb8-8bac-27e770a15d91   10Gi       RWO            thin                          4h7m
rook-ceph-mon-d                     Bound    pvc-53990c83-9583-4cf7-9225-f469ced86ae6   10Gi       RWO            thin                          110m
rook-ceph-mon-e                     Bound    pvc-71e43b77-edf3-4cb4-ac34-4fe86e0cc023   10Gi       RWO            thin                          84m

Comment 2 Travis Nielsen 2020-10-20 18:25:55 UTC
The deletion of unused (orphaned) mon PVCs is currently only done after a successful mon failover. The orphaned mon PVC remains in this case because the mon failover was cancelled since the original mon came back up. Agreed that Rook should clean up the PVC sooner, but moving this to 4.7 since it doesn't affect functionality.

Comment 4 Raz Tamir 2020-10-26 14:31:03 UTC
As discussed in the leads meeting, the fix doesn't seem to be risky so we will first fix it in 4.7 and follow with the same in 4.6.
Proposing as a blocker for now but in case @Travis will update that the fix is not that intuitive, feel free to move back to 4.7


Comment 5 Travis Nielsen 2020-10-26 20:09:37 UTC
The fix is low risk, to move the check for orphaned resources to every mon reconcile instead of only after a successful mon failover.

Comment 8 Martin Bukatovic 2020-11-05 23:54:26 UTC
Testing with

OCP 4.6.0-0.nightly-2020-11-05-024238
OCS ocs-operator.v4.6.0-624.ci

On GCP (a cloud, IPI platform).

Full version report

storage namespace openshift-cluster-storage-operator
image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9a2d75eb606e8cbf2fa0d203bfbc92e3db822286357c46d039ba74080c2dc08f
 * quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9a2d75eb606e8cbf2fa0d203bfbc92e3db822286357c46d039ba74080c2dc08f
image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:423e5b0624ed0bb736c5320c37611b72dcbb2094e785c2ab588f584f65157289
 * quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:423e5b0624ed0bb736c5320c37611b72dcbb2094e785c2ab588f584f65157289
image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c02fd4013a52b3d3047ae566f4e7e50c82c1087cb3acc59945cd01d718235e94
 * quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c02fd4013a52b3d3047ae566f4e7e50c82c1087cb3acc59945cd01d718235e94

storage namespace openshift-kube-storage-version-migrator
image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:da72171372d59ebbd8319073640716c7777a945848a39538224354b1566a0b02
 * quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:da72171372d59ebbd8319073640716c7777a945848a39538224354b1566a0b02

storage namespace openshift-kube-storage-version-migrator-operator
image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:60ff0a413ba64ee38c13f13902071fc7306f24eb46edcacc8778507cf78f15ef
 * quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:60ff0a413ba64ee38c13f13902071fc7306f24eb46edcacc8778507cf78f15ef

storage namespace openshift-storage
image quay.io/rhceph-dev/cephcsi@sha256:3b2fff211845eab398d66262a4c47eb5eadbcd982de80387aa47dd23f6572b22
 * quay.io/rhceph-dev/cephcsi@sha256:3b2fff211845eab398d66262a4c47eb5eadbcd982de80387aa47dd23f6572b22
image quay.io/rhceph-dev/ose-csi-node-driver-registrar@sha256:4cf9fb2d021b0ce409ef7fdf2d4b182f655950ba28cb822ffc4549de422d4184
 * quay.io/rhceph-dev/ose-csi-node-driver-registrar@sha256:30b3c4f21074d323f5d62500af63251f41f96193907b953a742bfb9067d05114
image quay.io/rhceph-dev/ose-csi-external-attacher@sha256:87db9cca0c2e58343e1ca60e9ae4294f115515e7724984de30207b1205ed3611
 * quay.io/rhceph-dev/ose-csi-external-attacher@sha256:79d85b1739ef751175cc33ca15e5d979f4bdf0fa5f41b9b7e66d58015b9af6b8
image quay.io/rhceph-dev/ose-csi-external-provisioner@sha256:376ee9cf355554a3174e12329545d1a89ed0228ac2597adbd282ae513dbb84e8
 * quay.io/rhceph-dev/ose-csi-external-provisioner@sha256:376ee9cf355554a3174e12329545d1a89ed0228ac2597adbd282ae513dbb84e8
image quay.io/rhceph-dev/ose-csi-external-resizer@sha256:136a81c87028a8f7e6c1c579923548b36dbf034e4dd24215e1739ac484e7382b
 * quay.io/rhceph-dev/ose-csi-external-resizer@sha256:136a81c87028a8f7e6c1c579923548b36dbf034e4dd24215e1739ac484e7382b
image quay.io/rhceph-dev/ose-csi-external-snapshotter@sha256:90f9dd56fa26339f6d4ff81c7e94794c237ba0963f660480d129c67becdc5e5f
 * quay.io/rhceph-dev/ose-csi-external-snapshotter@sha256:612307360e8c6bb8994087fc1c44d0e8a35a9e6d5d45b5803d77dd32820484ad
image quay.io/rhceph-dev/mcg-core@sha256:01975cd563b7e802973a8dc4f0b79b43df070f666c7993ab51cf3aefda39002a
 * quay.io/rhceph-dev/mcg-core@sha256:01975cd563b7e802973a8dc4f0b79b43df070f666c7993ab51cf3aefda39002a
image registry.redhat.io/rhscl/mongodb-36-rhel7@sha256:6abfa44b8b4d7b45d83b1158865194cb64481148701977167e900e5db4e1eba3
 * registry.redhat.io/rhscl/mongodb-36-rhel7@sha256:6abfa44b8b4d7b45d83b1158865194cb64481148701977167e900e5db4e1eba3
image quay.io/rhceph-dev/mcg-operator@sha256:a293f3c5933a28812b84e2fe90de40ad64ad0207660787b66e168303b0aafaac
 * quay.io/rhceph-dev/mcg-operator@sha256:4ac7bc0e54d6190ece9cbc4c81e0644711f1adbb65fda48a2b43a9ab3b256aa1
image quay.io/rhceph-dev/ocs-operator@sha256:7ba5917c82bd08472a221c4bc12f054fdc66fb02fc36ff59270554ca61379da1
 * quay.io/rhceph-dev/ocs-operator@sha256:7ba5917c82bd08472a221c4bc12f054fdc66fb02fc36ff59270554ca61379da1
image quay.io/rhceph-dev/rhceph@sha256:22ea8ee38cd8283f636c2eeb640eb4a1bb744efb18abee114517926f4a03bff9
 * quay.io/rhceph-dev/rhceph@sha256:22ea8ee38cd8283f636c2eeb640eb4a1bb744efb18abee114517926f4a03bff9
image quay.io/rhceph-dev/rook-ceph@sha256:c14792c0e59cf7866b6a19c970513071d0ea106b28e79733a2d26240adb507cd
 * quay.io/rhceph-dev/rook-ceph@sha256:c14792c0e59cf7866b6a19c970513071d0ea106b28e79733a2d26240adb507cd


1. Deployed a 3 node OCS 4.6 cluster on GCP
2. Stopped one worker node from GCP Console, so that rook-ceph-operator and
   ocs-operator were not affected by this.
3. Waited for about half an hour, seen new mon-canary pod and new mon PVC, both
   in pending state.
4. Started the the node again.
5. I see that new mon pod was deployed, and the PVC of the removed pod is not

Final state:

$ oc get pods -n openshift-storage | grep mon-
rook-ceph-mon-a-775c887788-5fs6d                                  1/1     Running     0          77m
rook-ceph-mon-c-5fcf9dbc58-mdz8s                                  1/1     Running     0          76m
rook-ceph-mon-d-54d89bd4d9-r9wpm                                  1/1     Running     0          10m
$ oc get pvc -n openshift-storage | grep mon-
rook-ceph-mon-a                         Bound    pvc-61977a0b-1f49-4632-a63f-44539bc3c26a   10Gi       RWO            standard                      80m
rook-ceph-mon-c                         Bound    pvc-53b08463-97e3-43c0-b933-29924bfef914   10Gi       RWO            standard                      80m
rook-ceph-mon-d                         Bound    pvc-4369dbac-b6db-4887-bd2e-cc8912bdae20   10Gi       RWO            standard                      42m

So on the one hand, there is no pending mon PVC, but on the other hand, I seem
to observe a successful mon failover, which means I haven't reproduced the bug
following the original reproducer.

Comment 9 Martin Bukatovic 2020-11-05 23:55:36 UTC
Asking original dev contact and reported to update reproducer to inflict unsuccessful mon failover on OCS 4.6.

Comment 10 Martin Bukatovic 2020-11-05 23:56:07 UTC
Asking original reporter to update reproducer to inflict unsuccessful mon failover on OCS 4.6.

Comment 11 Travis Nielsen 2020-11-09 15:31:49 UTC
Given another improvement in the mon failover, there isn't a good way to get a mon failover to fail. The whole point of the operator is to succeed when it performs actions, so apparently the operator is getting too good to simulate failure.

The way I verified the fix was to manually create a PVC with the similar labels (but no pod has it mounted) as other mon PVCs so that the operator would find it at the next reconcile and delete it.

Comment 13 Elad 2020-11-11 19:50:11 UTC
1st option + regression cycle of disruptive testing should suffice

Comment 15 Travis Nielsen 2020-11-14 00:07:01 UTC
In comment 11 I mentioned a way to simulate an orphaned PVC from a failed mon failover. If simulating the repro isn't valid, then we must go with option 1 since a reliable way to reproduce the failed mon failover in a real cluster is so difficult. Retracting the fix should not be done IMO unless a regression is found.

Comment 17 Martin Bukatovic 2020-11-19 18:23:04 UTC
Based on comment 15, 13, 12 and 8, making as verified (with limitations described in comment 12).

Note You need to log in before you can comment on or make changes to this bug.