Description of problem: --------------------------------- In case of PVC deletion, nothing is blocking the PVC deletion even if the storageclass is deleted. external provisioner will send a request to the CSI driver with volumeID and no secrets will be sent as it cannot get the storageclass name. But in the case of volume snapshot, the volumesnapshot deletion will never complete as the volumesnapshotclass is already deleted. As per the CSI spec the secrets are optional parameters why external snapshotter is not sending requests to the CSI driver for delete snapshot. is this expected behavior? Expecially in cases of OCS uninstall, if users fail to delete VolumeSnapshots(VS) before deleting Storagecluster, the VolumeSnapshotClass gets deleted but the VS stay behind. Even a force deletion doesn't work in deleting these dangling VS $ oc get volumesnapshotclass No resources found $oc delete volumesnapshot -n default --all --force --grace-period=0 ---- $ oc get volumesnapshot -A NAMESPACE NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE default test-cephfs-snapshot false test-cephfs 2Gi ocs-storagecluster-cephfsplugin-snapclass snapcontent-bc40d6e8-1387-40df-9e46-104dda851630 36h 36h default test-rbd-snapshot false test-rbd 5Gi ocs-storagecluster-rbdplugin-snapclass snapcontent-602939aa-73dc-43b2-869e-db975a5a9b05 36h 36h $ oc get volumesnapshotcontent -A NAME READYTOUSE RESTORESIZE DELETIONPOLICY DRIVER VOLUMESNAPSHOTCLASS VOLUMESNAPSHOT AGE snapcontent-602939aa-73dc-43b2-869e-db975a5a9b05 true 5368709120 Delete openshift-storage.rbd.csi.ceph.com ocs-storagecluster-rbdplugin-snapclass test-rbd-snapshot 36h snapcontent-bc40d6e8-1387-40df-9e46-104dda851630 true 2147483648 Delete openshift-storage.cephfs.csi.ceph.com ocs-storagecluster-cephfsplugin-snapclass test-cephfs-snapshot 36h Version-Release number of selected component (if applicable): ============================================================= Tested last with 4.6.0-0.nightly-2020-10-22-034051 and OCS = 4.6.0-147.ci How reproducible: ====================== Always Steps to Reproduce: 1. Create an OCS 4.6 cluster with OCP 4.6 2. Create one each of CephFS and RBD PVCs and create snapshots using the default VS classes 3. To initiate OCS uninstall, delete the OBCs and PVCs but do not delete the VS 4. Delete the SToragecluster, which in turn deletes the Volumesnapshot class $$ oc delete -n openshift-storage storagecluster --all --wait=true 5. Try to delete the dangling and leftover Volumesnapshots as the Cephcluster is already gone (no ceph access) $ oc delete volumesnapshot -n <project-name> --all --force --grace-period=0 3. 6. See if the VS deletion succeeds $ oc get volumesnapshotcontent -A $ oc get volumesnapshot -A Actual results: --------------------- The Volumesnapshots fail to get deleted, even with force option Expected results: ------------------------- User should be forcefully able to delete the leftovers in case the Volumesnapshot class is unknowingly deleted before VS Additional info: ======================= $ oc logs csi-snapshot-controller-8bb7f7589-64tts -f -n openshift-cluster-storage-operator|tee csi-snapshot-controller-8bb7f7589-64tts.log E1030 06:52:00.160184 1 snapshot_controller.go:1180] failed to retrieve snapshot class ocs-storagecluster-cephfsplugin-snapclass from the informer: "volumesnapshotclass.snapshot.storage.k8s.io \"ocs-storagecluster-cephfsplugin-snapclass\" not found" E1030 06:52:00.160292 1 snapshot_controller_base.go:331] checkAndUpdateSnapshotClass failed to getSnapshotClass volumesnapshotclass.snapshot.storage.k8s.io "ocs-storagecluster-cephfsplugin-snapclass" not found I1030 06:52:00.160341 1 snapshot_controller.go:897] cannot get claim from snapshot [test-cephfs-snapshot]: [failed to retrieve PVC test-cephfs from the lister: "persistentvolumeclaim \"test-cephfs\" not found"] Claim may be deleted already. No need to remove finalizer on the claim. ------------------------------------------------------------------ From Git hub issue ================== https://github.com/kubernetes-csi/external-snapshotter/issues/412 I1029 10:13:02.926462 1 snapshot_controller.go:308] checkandRemoveSnapshotFinalizersAndCheckandDeleteContent: set DeletionTimeStamp on content [snapcontent-f4e2764a-7f9f-4619-a89e-9d7d1318bcd4]. I1029 10:13:02.934162 1 snapshot_controller.go:316] checkandRemoveSnapshotFinalizersAndCheckandDeleteContent: Remove Finalizer for VolumeSnapshot[default/rbd-pvc-snapshot] I1029 10:13:02.934192 1 snapshot_controller.go:901] checkandRemovePVCFinalizer for snapshot [rbd-pvc-snapshot]: snapshot status [&v1beta1.VolumeSnapshotStatus{BoundVolumeSnapshotContentName:(*string)(0xc000285150), CreationTime:(*v1.Time)(0xc0004c2d80), ReadyToUse:(*bool)(0xc0002a8308), RestoreSize:(*resource.Quantity)(0xc00029f300), Error:(*v1beta1.VolumeSnapshotError)(0xc0002851b0)}] I1029 10:13:02.943296 1 reflector.go:369] github.com/kubernetes-csi/external-snapshotter/client/v3/informers/externalversions/factory.go:117: forcing resync I1029 10:13:02.943400 1 snapshot_controller_base.go:158] enqueued "default/rbd-pvc-snapshot" for sync I1029 10:13:02.951105 1 util.go:264] storeObjectUpdate updating snapshot "default/rbd-pvc-snapshot" with version 1567 I1029 10:13:02.951304 1 snapshot_controller.go:1344] Removed protection finalizer from volume snapshot default/rbd-pvc-snapshot I1029 10:13:02.951460 1 snapshot_controller_base.go:202] syncSnapshotByKey[default/rbd-pvc-snapshot] I1029 10:13:02.951525 1 snapshot_controller_base.go:205] snapshotWorker: snapshot namespace [default] name [rbd-pvc-snapshot] I1029 10:13:02.951545 1 snapshot_controller_base.go:328] checkAndUpdateSnapshotClass [rbd-pvc-snapshot]: VolumeSnapshotClassName [csi-rbdplugin-snapclass] I1029 10:13:02.951555 1 snapshot_controller.go:1176] getSnapshotClass: VolumeSnapshotClassName [csi-rbdplugin-snapclass] E1029 10:13:02.951573 1 snapshot_controller.go:1180] failed to retrieve snapshot class csi-rbdplugin-snapclass from the informer: "volumesnapshotclass.snapshot.storage.k8s.io \"csi-rbdplugin-snapclass\" not found" E1029 10:13:02.951597 1 snapshot_controller_base.go:331] checkAndUpdateSnapshotClass failed to getSnapshotClass volumesnapshotclass.snapshot.storage.k8s.io "csi-rbdplugin-snapclass" not found I1029 10:13:02.951609 1 snapshot_controller.go:721] updateSnapshotStatusWithEvent[default/rbd-pvc-snapshot] I1029 10:13:02.951618 1 snapshot_controller.go:724] updateSnapshotStatusWithEvent[rbd-pvc-snapshot]: the same error &{2020-10-29 10:11:39 +0000 UTC 0xc0002851d0} is already set I1029 10:13:02.951694 1 snapshot_controller_base.go:220] Snapshot "default/rbd-pvc-snapshot" is being deleted. SnapshotClass has already been removed I1029 10:13:02.951715 1 snapshot_controller_base.go:222] Updating snapshot "default/rbd-pvc-snapshot" I1029 10:13:02.951728 1 snapshot_controller_base.go:358] updateSnapshot "default/rbd-pvc-snapshot" I1029 10:13:02.951751 1 util.go:264] storeObjectUpdate updating snapshot "default/rbd-pvc-snapshot" with version 1567 I1029 10:13:02.951778 1 snapshot_controller.go:180] synchronizing VolumeSnapshot[default/rbd-pvc-snapshot]: bound to: "snapcontent-f4e2764a-7f9f-4619-a89e-9d7d1318bcd4", Completed: false I1029 10:13:02.951792 1 snapshot_controller.go:182] syncSnapshot [default/rbd-pvc-snapshot]: check if we should remove finalizer on snapshot PVC source and remove it if we can I1029 10:13:02.951813 1 snapshot_controller.go:901] checkandRemovePVCFinalizer for snapshot [rbd-pvc-snapshot]: snapshot status [&v1beta1.VolumeSnapshotStatus{BoundVolumeSnapshotContentName:(*string)(0xc000285150), CreationTime:(*v1.Time)(0xc0004c2d80), ReadyToUse:(*bool)(0xc0002a8308), RestoreSize:(*resource.Quantity)(0xc00029f300), Error:(*v1beta1.VolumeSnapshotError)(0xc0002851b0)}] I1029 10:13:02.951893 1 snapshot_controller.go:191] syncSnapshot[default/rbd-pvc-snapshot]: check if we should add invalid label on snapshot I1029 10:13:02.951919 1 snapshot_controller.go:238] processSnapshotWithDeletionTimestamp VolumeSnapshot[default/rbd-pvc-snapshot]: bound to: "snapcontent-f4e2764a-7f9f-4619-a89e-9d7d1318bcd4", Completed: false I1029 10:13:02.951936 1 snapshot_controller.go:272] processSnapshotWithDeletionTimestamp[default/rbd-pvc-snapshot]: delete snapshot content and remove finalizer from snapshot if needed I1029 10:13:02.951951 1 snapshot_controller.go:278] checkandRemoveSnapshotFinalizersAndCheckandDeleteContent VolumeSnapshot[default/rbd-pvc-snapshot]: bound to: "snapcontent-f4e2764a-7f9f-4619-a89e-9d7d1318bcd4", Completed: false I1029 10:13:02.951991 1 snapshot_controller.go:796] isVolumeBeingCreatedFromSnapshot: no volume is being created from snapshot default/rbd-pvc-snapshot I1029 10:13:02.952021 1 snapshot_controller.go:297] checkandRemoveSnapshotFinalizersAndCheckandDeleteContent[default/rbd-pvc-snapshot]: Set VolumeSnapshotBeingDeleted annotation on the content [snapcontent-f4e2764a-7f9f-4619-a89e-9d7d1318bcd4] I1029 10:13:02.952036 1 snapshot_controller.go:308] checkandRemoveSnapshotFinalizersAndCheckandDeleteContent: set DeletionTimeStamp on content [snapcontent-f4e2764a-7f9f-4619-a89e-9d7d1318bcd4]. I1029 10:13:02.964786 1 snapshot_controller.go:316] checkandRemoveSnapshotFinalizersAndCheckandDeleteContent: Remove Finalizer for VolumeSnapshot[default/rbd-pvc-snapshot] I1029 10:13:02.964811 1 snapshot_controller.go:901] checkandRemovePVCFinalizer for snapshot [rbd-pvc-snapshot]: snapshot status [&v1beta1.VolumeSnapshotStatus{BoundVolumeSnapshotContentName:(*string)(0xc000285150), CreationTime:(*v1.Time)(0xc0004c2d80), ReadyToUse:(*bool)(0xc0002a8308), RestoreSize:(*resource.Quantity)(0xc00029f300), Error:(*v1beta1.VolumeSnapshotError)(0xc0002851b0)}] I1029 10:13:02.974612 1 util.go:264] storeObjectUpdate updating snapshot "default/rbd-pvc-snapshot" with version 1567 I1029 10:13:02.974651 1 snapshot_controller.go:1344] Removed protection finalizer from volume snapshot default/rbd-pvc-snapshot ``` ``` E1029 10:16:55.679819 1 snapshot_controller.go:224] getCSISnapshotInput failed to getClassFromVolumeSnapshot failed to retrieve snapshot class csi-rbdplugin-snapclass from the informer: "volumesnapshotclass.snapshot.storage.k8s.io \"csi-rbdplugin-snapclass\" not found" E1029 10:16:55.685015 1 goroutinemap.go:150] Operation for "delete-snapcontent-f4e2764a-7f9f-4619-a89e-9d7d1318bcd4" failed. No retries permitted until 2020-10-29 10:17:27.679951762 +0000 UTC m=+442.637480008 (durationBeforeRetry 32s). Error: "failed to get input parameters to delete snapshot for content snapcontent-f4e2764a-7f9f-4619-a89e-9d7d1318bcd4: \"failed to retrieve snapshot class csi-rbdplugin-snapclass from the informer: \\\"volumesnapshotclass.snapshot.storage.k8s.io \\\\\\\"csi-rbdplugin-snapclass\\\\\\\" not found\\\"\"" I1029 10:16:55.684904 1 event.go:281] Event(v1.ObjectReference{Kind:"VolumeSnapshotContent", Namespace:"", Name:"snapcontent-f4e2764a-7f9f-4619-a89e-9d7d1318bcd4", UID:"ffc95922-3b31-494e-a40a-abc11f97c648", APIVersion:"snapshot.storage.k8s.io/v1beta1", ResourceVersion:"1565", FieldPath:""}): type: 'Warning' reason: 'SnapshotDeleteError' Failed to get snapshot class or credentials E1029 10:17:55.679735 1 snapshot_controller.go:481] failed to retrieve snapshot class csi-rbdplugin-snapclass from the informer: "volumesnapshotclass.snapshot.storage.k8s.io \"csi-rbdplugin-snapclass\" not found" E1029 10:17:55.679764 1 snapshot_controller.go:224] getCSISnapshotInput failed to getClassFromVolumeSnapshot failed to retrieve snapshot class csi-rbdplugin-snapclass from the informer: "volumesnapshotclass.snapshot.storage.k8s.io \"csi-rbdplugin-snapclass\" not found" E1029 10:17:55.679819 1 goroutinemap.go:150] Operation for "delete-snapcontent-f4e2764a-7f9f-4619-a89e-9d7d1318bcd4" failed. No retries permitted until 2020-10-29 10:18:59.679780946 +0000 UTC m=+534.637309142 (durationBeforeRetry 1m4s). Error: "failed to get input parameters to delete snapshot for content snapcontent-f4e2764a-7f9f-4619-a89e-9d7d1318bcd4: \"failed to retrieve snapshot class csi-rbdplugin-snapclass from the informer: \\\"volumesnapshotclass.snapshot.storage.k8s.io \\\\\\\"csi-rbdplugin-snapclass\\\\\\\" not found\\\"\"" I1029 10:17:55.681554 1 event.go:281] Event(v1.ObjectReference{Kind:"VolumeSnapshotContent", Namespace:"", Name:"snapcontent-f4e2764a-7f9f-4619-a89e-9d7d1318bcd4", UID:"ffc95922-3b31-494e-a40a-abc11f97c648", APIVersion:"snapshot.storage.k8s.io/v1beta1", ResourceVersion:"1565", FieldPath:""}): type: 'Warning' reason: 'SnapshotDeleteError' Failed to get snapshot class or credentials E1029 10:19:55.680458 1 snapshot_controller.go:481] failed to retrieve snapshot class csi-rbdplugin-snapclass from the informer: "volumesnapshotclass.snapshot.storage.k8s.io \"csi-rbdplugin-snapclass\" not found" E1029 10:19:55.680653 1 snapshot_controller.go:224] getCSISnapshotInput failed to getClassFromVolumeSnapshot failed to retrieve snapshot class csi-rbdplugin-snapclass from the informer: "volumesnapshotclass.snapshot.storage.k8s.io \"csi-rbdplugin-snapclass\" not found" E1029 10:19:55.680896 1 goroutinemap.go:150] Operation for "delete-snapcontent-f4e2764a-7f9f-4619-a89e-9d7d1318bcd4" failed. No retries permitted until 2020-10-29 10:21:57.68076332 +0000 UTC m=+712.638291701 (durationBeforeRetry 2m2s). Error: "failed to get input parameters to delete snapshot for content snapcontent-f4e2764a-7f9f-4619-a89e-9d7d1318bcd4: \"failed to retrieve snapshot class csi-rbdplugin-snapclass from the informer: \\\"volumesnapshotclass.snapshot.storage.k8s.io \\\\\\\"csi-rbdplugin-snapclass\\\\\\\" not found\\\"\"" I1029 10:19:55.687959 1 event.go:281] Event(v1.ObjectReference{Kind:"VolumeSnapshotContent", Namespace:"", Name:"snapcontent-f4e2764a-7f9f-4619-a89e-9d7d1318bcd4", UID:"ffc95922-3b31-494e-a40a-abc11f97c648", APIVersion:"snapshot.storage.k8s.io/v1beta1", ResourceVersion:"1565", FieldPath:""}): type: 'Warning' reason: 'SnapshotDeleteError' Failed to get snapshot class or credentials ```
Hi Chris, Does this have anything to do OCS based CSI containers? Anyways, sorry I was working on something else. I could try to reproduce with GA'd OCP 4.6 build by Monday (Nov 9) and provide you with the cluster.
Created attachment 1727165 [details] Finalizer-null-approach-to-delete-the-leftovers (In reply to Jan Safranek from comment #2) > Yes, removing the finalizer with oc patch is the best workaround. Yes the finalizer approach worked $ oc patch -n default volu[nberry@localhost bug-repro-1893739]$ oc patch -n default volumesnapshot/test-cephfs-snapshot --type=merge -p '{"metadata": {"finalizers":null}}' volumesnapshot.snapshot.storage.k8s.io/test-cephfs-snapshot patched oc patch -n default volumesnapshot/test-rbd-snapshot --type=merge -p '{"metadata": {"finalizers":null}}'; date --utc volumesnapshot.snapshot.storage.k8s.io/test-rbd-snapshot patched Fri Nov 6 16:31:41 UTC 2020 $ oc get volumesnapshot -A No resources found $ oc get volumesnapshotcontent NAME READYTOUSE RESTORESIZE DELETIONPOLICY DRIVER VOLUMESNAPSHOTCLASS VOLUMESNAPSHOT AGE snapcontent-45ed1739-4e66-47ae-a725-9f754c8fc418 true 5368709120 Delete openshift-storage.cephfs.csi.ceph.com ocs-storagecluster-cephfsplugin-snapclass test-cephfs-snapshot 85m snapcontent-730a0fc5-18b5-441e-8b6f-2c448c23300b true 10737418240 Delete openshift-storage.rbd.csi.ceph.com ocs-storagecluster-rbdplugin-snapclass test-rbd-snapshot 84m [nberry@localhost bug-repro-1893739]$ oc patch -n default volumesnapshotcontent/snapcontent-45ed1739-4e66-47ae-a725-9f754c8fc418 --type=merge -p '{"metadata": {"finalizers":null}}'; date --utc volumesnapshotcontent.snapshot.storage.k8s.io/snapcontent-45ed1739-4e66-47ae-a725-9f754c8fc418 patched Fri Nov 6 16:32:29 UTC 2020 [nberry@localhost bug-repro-1893739]$ oc patch -n default volumesnapshotcontent/snapcontent-730a0fc5-18b5-441e-8b6f-2c448c23300b --type=merge -p '{"metadata": {"finalizers":null}}'; date --utc volumesnapshotcontent.snapshot.storage.k8s.io/snapcontent-730a0fc5-18b5-441e-8b6f-2c448c23300b patched Fri Nov 6 16:32:49 UTC 2020 [nberry@localhost bug-repro-1893739]$ oc get volumesnapshotcontent No resources found Attached the controller logs for reference
This issue is pertaining to deleting the VolumeSnapshotClass when the driver requires credentials. In this case, the driver fails to remove the backend snapshot, which prevents the VolumeSnapshotContent being deleted, which prevents the VolumeSnapshot from being deleted. Note that even if this is resolved, we still won't be able to delete a VolumeSnapshot if the secret itself has been deleted, as the driver requires the secret to exist to proceed. At this point I think we can fix the issue with the VolumeSnapshotClass being deleted and attempt to include the description in the VolumeSnapshotContent status message. The relevant sections are below: - Here we try to get the class and return a nil credentials if it's not found - https://github.com/kubernetes-csi/external-snapshotter/blob/23b415b6aaa7e0eba402bc3984ae2b726b101a80/pkg/sidecar-controller/snapshot_controller.go#L184-L189 - And then we pass these nil credentials into the deletion request - https://github.com/kubernetes-csi/external-snapshotter/blob/23b415b6aaa7e0eba402bc3984ae2b726b101a80/pkg/sidecar-controller/snapshot_controller.go#L341-L350 Since we have nil credentials, the deletion request fails, and then we see this error logged from line 350.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633