Created attachment 1725806 [details] ocs-operator-logs Description of problem (please be detailed as possible and provide log snippests): ------------------------------------------------------------------------ With Graceful mode of Uninstall, if the OCS cluster has PVCs or OBCs carved out of ceph resources, the Uninstall of storagecluster gets stuck and waits to proceed forward, till they are removed. As part of uninstall, the volumesnapshotclass gets deleted. But in the absence of VS class, volumesnapshot deletions get stuck permanently and force deletion also doesn't work in clearing the leftover (unlike PVC/PV) In both scenarios of uninstall(graceful / forced), as we do not check for the Volumesnapshot presence/absence and proceed with deletion of VS class. this could lead to ending up with Bug 1893739 and user might not be able to force delete the leftover Volumesnapshot and volumeSnapshotContents. Version of all relevant components (if applicable): ------------------------------------------------------- OCs= ocs-operator.v4.6.0-147.ci OCP = 4.6.0-0.nightly-2020-10-22-034051 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? -------------------------------------------- No but it can cause leftovers which are difficult to cleanup - bug 1893739 Is there any workaround available to the best of your knowledge? ----------------------------------------------- Not sure Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? ------------------------------------- 4 Can this issue reproducible? ----------------------------- tested once, but probably yes Can this issue reproduce from the UI? ------------------------------------------ NA If this is a regression, please provide more details to justify this: ---------------------------------------------------- No . Steps to Reproduce: ----------------------- 1. Create an OCS 4.6 cluster with OCP 4.6 2. Create one each of CephFS and RBD PVCs and create snapshots using the default VS classes 3. To initiate OCS uninstall, delete the OBCs and PVCs but do not delete the VS 4. Delete the SToragecluster, which in turn deletes the Volumesnapshot class $$ oc delete -n openshift-storage storagecluster --all --wait=true $oc get volumesnapshotclass 5. Try to delete the dangling and leftover Volumesnapshots as the Cephcluster is already gone (no ceph access) $ oc delete volumesnapshot -n <project-name> --all --force --grace-period=0 6. See if the VS deletion succeeds $ oc get volumesnapshotcontent -A $ oc get volumesnapshot -A Actual results: --------------------- The Volumesnapshots fail to get deleted, even with force option as OCS uninstall deleted Volumesnapshot class, even when Volumesnapshots existed. Expected results: ------------------------- Atleast for graceful mode, we should check for the presence of Volumesnapshots as well, before deleting the storagcluster successfully. Additional info: -------------------------- $ oc get volumesnapshotclass No resources found $oc delete volumesnapshot -n default --all --force --grace-period=0 ---- $ oc get volumesnapshot -A NAMESPACE NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE default test-cephfs-snapshot false test-cephfs 2Gi ocs-storagecluster-cephfsplugin-snapclass snapcontent-bc40d6e8-1387-40df-9e46-104dda851630 36h 36h default test-rbd-snapshot false test-rbd 5Gi ocs-storagecluster-rbdplugin-snapclass snapcontent-602939aa-73dc-43b2-869e-db975a5a9b05 36h 36h $ oc get volumesnapshotcontent -A NAME READYTOUSE RESTORESIZE DELETIONPOLICY DRIVER VOLUMESNAPSHOTCLASS VOLUMESNAPSHOT AGE snapcontent-602939aa-73dc-43b2-869e-db975a5a9b05 true 5368709120 Delete openshift-storage.rbd.csi.ceph.com ocs-storagecluster-rbdplugin-snapclass test-rbd-snapshot 36h snapcontent-bc40d6e8-1387-40df-9e46-104dda851630 true 2147483648 Delete openshift-storage.cephfs.csi.ceph.com ocs-storagecluster-cephfsplugin-snapclass test-cephfs-snapshot 36h
Severity is NOT high. It's an uninstall, with a specific case of snapshots. Not a big deal. Let's not abuse the severity field.
Moving this out. Had an offline discussion with Talur, this is not a blocker and needs more work, not something we can fix in 4.6
This is not critical to the product for OCS 4.7, and there is already sufficient documentation to deal with this manually. Moving to OCS 4.8. Also giving devel_ack+, since we should do this anyway.
Since the associated BZ in rook has been kicked to ODF 4.9, this one should do the same.
No code change required, already fixed in 4.9 *** This bug has been marked as a duplicate of bug 1968510 ***