Bug 1893747 - OCS uninstall should check for Volumesnapshots before proceeding with graceful Uninstall
Summary: OCS uninstall should check for Volumesnapshots before proceeding with gracefu...
Keywords:
Status: CLOSED DUPLICATE of bug 1968510
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Raghavendra Talur
QA Contact: Raz Tamir
URL:
Whiteboard:
Depends On:
Blocks: 1882359 1968510
TreeView+ depends on / blocked
 
Reported: 2020-11-02 14:05 UTC by Neha Berry
Modified: 2023-08-09 17:00 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1968510 (view as bug list)
Environment:
Last Closed: 2021-09-24 16:25:49 UTC
Embargoed:


Attachments (Terms of Use)
ocs-operator-logs (2.23 MB, text/plain)
2020-11-02 14:05 UTC, Neha Berry
no flags Details

Description Neha Berry 2020-11-02 14:05:49 UTC
Created attachment 1725806 [details]
ocs-operator-logs

Description of problem (please be detailed as possible and provide log
snippests):
------------------------------------------------------------------------

With Graceful mode of Uninstall, if the OCS cluster has PVCs or OBCs carved out of ceph resources, the Uninstall of storagecluster gets stuck and waits to proceed forward, till they are removed.

As part of uninstall, the volumesnapshotclass gets deleted. But in the absence of VS class, volumesnapshot deletions get stuck permanently and force deletion also doesn't work in clearing the leftover (unlike PVC/PV)

In both scenarios of uninstall(graceful / forced), as we do not check for the Volumesnapshot presence/absence and proceed with deletion of VS class. this could lead to ending up with Bug 1893739 and user might not be able to force delete the leftover Volumesnapshot and volumeSnapshotContents.

Version of all relevant components (if applicable):
-------------------------------------------------------
OCs= ocs-operator.v4.6.0-147.ci
OCP = 4.6.0-0.nightly-2020-10-22-034051


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
--------------------------------------------
No but it can cause leftovers which are difficult to cleanup - bug 1893739

Is there any workaround available to the best of your knowledge?
-----------------------------------------------
Not sure

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
-------------------------------------
4

Can this issue reproducible?
-----------------------------
tested once, but probably yes

Can this issue reproduce from the UI?
------------------------------------------
NA

If this is a regression, please provide more details to justify this:
----------------------------------------------------
No .

Steps to Reproduce:
-----------------------
1. Create an OCS 4.6 cluster with OCP 4.6
2. Create one each of CephFS and RBD PVCs and create snapshots using the default VS classes
3. To initiate OCS uninstall, delete the OBCs and PVCs but do not delete the VS

4. Delete the SToragecluster, which in turn deletes the Volumesnapshot class

$$ oc delete -n openshift-storage storagecluster --all --wait=true
$oc get volumesnapshotclass

5. Try to delete the dangling and leftover Volumesnapshots as the Cephcluster is already gone (no ceph access)

$ oc delete volumesnapshot -n <project-name> --all --force --grace-period=0


6. See if the VS deletion succeeds

$ oc get volumesnapshotcontent -A

$ oc get volumesnapshot -A

Actual results:
---------------------
The Volumesnapshots fail to get deleted, even with force option as OCS uninstall deleted Volumesnapshot class, even when Volumesnapshots existed. 


Expected results:
-------------------------
Atleast for graceful mode, we should check for the presence of Volumesnapshots as well, before deleting the storagcluster successfully.


Additional info:
--------------------------
$ oc get volumesnapshotclass
No resources found

$oc delete volumesnapshot -n default --all --force --grace-period=0
----

$ oc get volumesnapshot -A
NAMESPACE   NAME                   READYTOUSE   SOURCEPVC     SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS                               SNAPSHOTCONTENT                                    CREATIONTIME   AGE
default     test-cephfs-snapshot   false        test-cephfs                           2Gi           ocs-storagecluster-cephfsplugin-snapclass   snapcontent-bc40d6e8-1387-40df-9e46-104dda851630   36h            36h
default     test-rbd-snapshot      false        test-rbd                              5Gi           ocs-storagecluster-rbdplugin-snapclass      snapcontent-602939aa-73dc-43b2-869e-db975a5a9b05   36h            36h


$ oc get volumesnapshotcontent -A
NAME                                               READYTOUSE   RESTORESIZE   DELETIONPOLICY   DRIVER                                  VOLUMESNAPSHOTCLASS                         VOLUMESNAPSHOT         AGE
snapcontent-602939aa-73dc-43b2-869e-db975a5a9b05   true         5368709120    Delete           openshift-storage.rbd.csi.ceph.com      ocs-storagecluster-rbdplugin-snapclass      test-rbd-snapshot      36h
snapcontent-bc40d6e8-1387-40df-9e46-104dda851630   true         2147483648    Delete           openshift-storage.cephfs.csi.ceph.com   ocs-storagecluster-cephfsplugin-snapclass   test-cephfs-snapshot   36h

Comment 5 Yaniv Kaul 2020-11-11 07:24:30 UTC
Severity is NOT high. It's an uninstall, with a specific case of snapshots. Not a big deal. Let's not abuse the severity field.

Comment 6 Mudit Agarwal 2020-11-12 09:41:55 UTC
Moving this out. Had an offline discussion with Talur, this is not a blocker and needs more work, not something we can fix in 4.6

Comment 8 Jose A. Rivera 2021-02-08 15:23:07 UTC
This is not critical to the product for OCS 4.7, and there is already sufficient documentation to deal with this manually. Moving to OCS 4.8.

Also giving devel_ack+, since we should do this anyway.

Comment 9 Jose A. Rivera 2021-06-07 15:42:20 UTC
Since the associated BZ in rook has been kicked to ODF 4.9, this one should do the same.

Comment 10 Mudit Agarwal 2021-09-24 16:25:49 UTC
No code change required, already fixed in 4.9

*** This bug has been marked as a duplicate of bug 1968510 ***


Note You need to log in before you can comment on or make changes to this bug.