Bug 1893747

Summary: OCS uninstall should check for Volumesnapshots before proceeding with graceful Uninstall
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Neha Berry <nberry>
Component: ocs-operatorAssignee: Raghavendra Talur <rtalur>
Status: CLOSED DUPLICATE QA Contact: Raz Tamir <ratamir>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.6CC: ebenahar, jarrpa, madam, mrajanna, muagarwa, ocs-bugs, odf-bz-bot, sostapov, tnielsen
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1968510 (view as bug list) Environment:
Last Closed: 2021-09-24 16:25:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1882359, 1968510    
Attachments:
Description Flags
ocs-operator-logs none

Description Neha Berry 2020-11-02 14:05:49 UTC
Created attachment 1725806 [details]
ocs-operator-logs

Description of problem (please be detailed as possible and provide log
snippests):
------------------------------------------------------------------------

With Graceful mode of Uninstall, if the OCS cluster has PVCs or OBCs carved out of ceph resources, the Uninstall of storagecluster gets stuck and waits to proceed forward, till they are removed.

As part of uninstall, the volumesnapshotclass gets deleted. But in the absence of VS class, volumesnapshot deletions get stuck permanently and force deletion also doesn't work in clearing the leftover (unlike PVC/PV)

In both scenarios of uninstall(graceful / forced), as we do not check for the Volumesnapshot presence/absence and proceed with deletion of VS class. this could lead to ending up with Bug 1893739 and user might not be able to force delete the leftover Volumesnapshot and volumeSnapshotContents.

Version of all relevant components (if applicable):
-------------------------------------------------------
OCs= ocs-operator.v4.6.0-147.ci
OCP = 4.6.0-0.nightly-2020-10-22-034051


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
--------------------------------------------
No but it can cause leftovers which are difficult to cleanup - bug 1893739

Is there any workaround available to the best of your knowledge?
-----------------------------------------------
Not sure

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
-------------------------------------
4

Can this issue reproducible?
-----------------------------
tested once, but probably yes

Can this issue reproduce from the UI?
------------------------------------------
NA

If this is a regression, please provide more details to justify this:
----------------------------------------------------
No .

Steps to Reproduce:
-----------------------
1. Create an OCS 4.6 cluster with OCP 4.6
2. Create one each of CephFS and RBD PVCs and create snapshots using the default VS classes
3. To initiate OCS uninstall, delete the OBCs and PVCs but do not delete the VS

4. Delete the SToragecluster, which in turn deletes the Volumesnapshot class

$$ oc delete -n openshift-storage storagecluster --all --wait=true
$oc get volumesnapshotclass

5. Try to delete the dangling and leftover Volumesnapshots as the Cephcluster is already gone (no ceph access)

$ oc delete volumesnapshot -n <project-name> --all --force --grace-period=0


6. See if the VS deletion succeeds

$ oc get volumesnapshotcontent -A

$ oc get volumesnapshot -A

Actual results:
---------------------
The Volumesnapshots fail to get deleted, even with force option as OCS uninstall deleted Volumesnapshot class, even when Volumesnapshots existed. 


Expected results:
-------------------------
Atleast for graceful mode, we should check for the presence of Volumesnapshots as well, before deleting the storagcluster successfully.


Additional info:
--------------------------
$ oc get volumesnapshotclass
No resources found

$oc delete volumesnapshot -n default --all --force --grace-period=0
----

$ oc get volumesnapshot -A
NAMESPACE   NAME                   READYTOUSE   SOURCEPVC     SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS                               SNAPSHOTCONTENT                                    CREATIONTIME   AGE
default     test-cephfs-snapshot   false        test-cephfs                           2Gi           ocs-storagecluster-cephfsplugin-snapclass   snapcontent-bc40d6e8-1387-40df-9e46-104dda851630   36h            36h
default     test-rbd-snapshot      false        test-rbd                              5Gi           ocs-storagecluster-rbdplugin-snapclass      snapcontent-602939aa-73dc-43b2-869e-db975a5a9b05   36h            36h


$ oc get volumesnapshotcontent -A
NAME                                               READYTOUSE   RESTORESIZE   DELETIONPOLICY   DRIVER                                  VOLUMESNAPSHOTCLASS                         VOLUMESNAPSHOT         AGE
snapcontent-602939aa-73dc-43b2-869e-db975a5a9b05   true         5368709120    Delete           openshift-storage.rbd.csi.ceph.com      ocs-storagecluster-rbdplugin-snapclass      test-rbd-snapshot      36h
snapcontent-bc40d6e8-1387-40df-9e46-104dda851630   true         2147483648    Delete           openshift-storage.cephfs.csi.ceph.com   ocs-storagecluster-cephfsplugin-snapclass   test-cephfs-snapshot   36h

Comment 5 Yaniv Kaul 2020-11-11 07:24:30 UTC
Severity is NOT high. It's an uninstall, with a specific case of snapshots. Not a big deal. Let's not abuse the severity field.

Comment 6 Mudit Agarwal 2020-11-12 09:41:55 UTC
Moving this out. Had an offline discussion with Talur, this is not a blocker and needs more work, not something we can fix in 4.6

Comment 8 Jose A. Rivera 2021-02-08 15:23:07 UTC
This is not critical to the product for OCS 4.7, and there is already sufficient documentation to deal with this manually. Moving to OCS 4.8.

Also giving devel_ack+, since we should do this anyway.

Comment 9 Jose A. Rivera 2021-06-07 15:42:20 UTC
Since the associated BZ in rook has been kicked to ODF 4.9, this one should do the same.

Comment 10 Mudit Agarwal 2021-09-24 16:25:49 UTC
No code change required, already fixed in 4.9

*** This bug has been marked as a duplicate of bug 1968510 ***