Description of problem (please be detailed as possible and provide log snippests): OADP testing (backup flow), volumesnapshotclass policy set to 'Delete'. during backup is running volumesnapshot & volumesnapshotcontent are created and at the end of the test both are deleted. When checked the 'cephblockpool', csi-snap are exist. Version of all relevant components (if applicable): OCP 4.12.9 ODF 4.12.2 OADP 1.2.0-63 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? the pool can contain thousands and tens of thousands of csi-snap - maybe can impact on the Ceph performance Is there any workaround available to the best of your knowledge? manually delete Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? yes Can this issue reproduce from the UI? no If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. delete volumesnapshot , volumesnapshotcontent , csi-snap (from ceph) 2. sset volumesnapshotclass policy to 'Delete' 3. create Ns with few pods 4. run csi backup (OADP) 5. during the test VSs & VSCs are created 6. test completed - VSs & VSCs are deleted 7. check ceph pool - csi-snap not deleted must-gather output: https://drive.google.com/drive/folders/1fr1g04Xj9I4la93neJqFaWKvHiqcinrG?usp=share_link Actual results: csi-snap aren't deleted Expected results: csi-snap deleted Additional info: [root@f01-h07-000-r640 playbooks]# oc get sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE nvme-disks kubernetes.io/no-provisioner Delete WaitForFirstConsumer false 36d ocs-storagecluster-ceph-rbd (default) openshift-storage.rbd.csi.ceph.com Delete Immediate true 36d ocs-storagecluster-ceph-rgw openshift-storage.ceph.rook.io/bucket Delete Immediate false 36d ocs-storagecluster-cephfs openshift-storage.cephfs.csi.ceph.com Delete Immediate true 36d ocs-storagecluster-cephfs-shallow openshift-storage.cephfs.csi.ceph.com Delete Immediate true 22d openshift-storage.noobaa.io openshift-storage.noobaa.io/obc Delete Immediate false 36d ssd-disks kubernetes.io/no-provisioner Delete WaitForFirstConsumer false 36d [root@f01-h07-000-r640 playbooks]# oc get volumesnapshotclass NAME DRIVER DELETIONPOLICY AGE ocs-storagecluster-cephfsplugin-snapclass openshift-storage.cephfs.csi.ceph.com Delete 36d ocs-storagecluster-rbdplugin-snapclass openshift-storage.rbd.csi.ceph.com Delete 36d scale-volumesnapshotclass openshift-storage.rbd.csi.ceph.com Delete 35d [root@f01-h07-000-r640 playbooks]# oc get volumesnapshotclass scale-volumesnapshotclass -oyaml apiVersion: snapshot.storage.k8s.io/v1 deletionPolicy: Delete driver: openshift-storage.rbd.csi.ceph.com kind: VolumeSnapshotClass metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"snapshot.storage.k8s.io/v1","deletionPolicy":"Delete","driver":"openshift-storage.rbd.csi.ceph.com","kind":"VolumeSnapshotClass","metadata":{"annotations":{"snapshot.storage.kubernetes.io/is-default-class":"true"},"labels":{"velero.io/csi-volumesnapshot-class":"true"},"name":"scale-volumesnapshotclass"},"parameters":{"clusterID":"openshift-storage","csi.storage.k8s.io/snapshotter-secret-name":"rook-csi-rbd-provisioner","csi.storage.k8s.io/snapshotter-secret-namespace":"openshift-storage"}} snapshot.storage.kubernetes.io/is-default-class: "true" creationTimestamp: "2023-04-05T11:01:53Z" generation: 52 labels: velero.io/csi-volumesnapshot-class: "true" name: scale-volumesnapshotclass resourceVersion: "31746873" uid: 02635ee4-fb62-4aa1-8f8d-36df79bcaa0d parameters: clusterID: openshift-storage csi.storage.k8s.io/snapshotter-secret-name: rook-csi-rbd-provisioner csi.storage.k8s.io/snapshotter-secret-namespace: openshift-storage sh-4.4$ rbd ls --pool=ocs-storagecluster-cephblockpool csi-snap-32a8aae4-4a13-4665-a4ec-d1966d6ca4d1 csi-snap-4f4cdeda-7170-4390-8616-8808fa4820d6 csi-snap-8637a09d-6415-41c5-956c-6423f10e6d8a csi-snap-97fd08d1-5d51-4108-bde7-35eb98ec3c5d csi-snap-a6596798-4d4f-47db-889c-cf85f67c7502 csi-snap-a9d4469d-bdba-4600-8f68-142df08e7f8d csi-snap-baa534a7-3dbf-451b-b578-d1650e4e3769 csi-snap-ca800564-e4cf-4fa8-a6e2-735f99f65478 csi-snap-cfa2190b-fd9c-4602-8d9f-6e85ebd9824a csi-snap-d740aebf-9170-4b53-a069-839eed37b85a csi-vol-011c6410-83c6-4e73-916a-ce086ef9bf10 csi-vol-0f470f3a-8810-4396-96ce-1f31d5db8fe2 csi-vol-108050ba-bef8-4cf5-b9a6-a62fd816988e csi-vol-121d9ef5-13a1-4b25-9eea-a40481f65e37 csi-vol-72edaf1f-f5ae-488d-a1b9-71aa2b1a0447 csi-vol-7dc4a909-43f9-4b9e-b279-1aa8b9b64f95 csi-vol-7eaed530-829d-4c91-9de7-ba979ef1ecf9 csi-vol-9806be2d-1313-4773-b272-1f1869c1cf9d csi-vol-bd6e5b1f-b554-402e-8cae-0c02508e015e csi-vol-c162d738-dc25-476f-aa0f-74975aa09fff
It seems that the VolumeSnapshotClass is not correctly configured. The logs from the csi-rbdplugin-provisioner/csi-snapshotter contain many messages like the following: 2023-04-26T04:03:09.673248713Z E0426 04:03:09.673201 1 snapshot_controller_base.go:283] could not sync content "snapcontent-e5c948cd-6435-457b-8d22-67bd35db0398-clone": failed to delete snapshot "snapcontent-e5c948cd-6435-457b-8d22-67bd35db0398-clone", err: failed to delete snapshot content snapcontent-e5c948cd-6435-457b-8d22-67bd35db0398-clone: "rpc error: code = Internal desc = provided secret is empty" This prevents the VolumeSnapshotContent from being deleted. These objects are expected to be still in the cluster, until deletion succeeded. Could you please check: 1. secrets in the VolumeSnapshotClass 2. VolumeSnapshotContent objects in the cluster
Not a 4.13 blocker
I have not been able to reproduce this with simple steps: ---- >% ---- # # yaml files from github.com/ceph/ceph-csi/examples/rbd/ # oc_wait_status() { local TEMPLATE="${1}" UNTIL="${2}" OBJ="${3}" local STATUS='' while [ "${STATUS}" != "${UNTIL}" ] do [ -z "${STATUS}" ] || sleep 1 STATUS=$(oc get --template="{{${TEMPLATE}}}" "${OBJ}") done } create_pvc() { oc create -f pvc.yaml oc_wait_status .status.phase Bound persistentvolumeclaim/rbd-pvc } create_snapshot() { oc create -f snapshot.yaml oc_wait_status .status.readyToUse true volumesnapshot/rbd-pvc-snapshot } restore_pvc() { oc create -f pvc-restore.yaml oc_wait_status .status.phase Bound persistentvolumeclaim/rbd-pvc-restore } cleanup() { cat pvc.yaml snapshot.yaml pvc-restore.yaml | oc delete -f- --wait } RUNS=0 while true do create_pvc create_snapshot restore_pvc cleanup RUNS=$[RUNS+1] echo "run ${RUNS} finished" sleep 3 done ---- %< ---- There is no growing list of snapshots on the Ceph side after running this for a working day (~5000 iterations).
Hello Madhu, ODF version was updated to 4.12.5, still same behavior will send you live cluster info in gchat thanks
once the policy change to 'Delete' and delete the 'volumesnapshot' & 'volumesnapshotcontent' the 'csi-snap' was deleted too.