Bug 1951399 - volumesnapshotcontent cannot be deleted; SnapshotDeleteError Failed to delete snapshot
Summary: volumesnapshotcontent cannot be deleted; SnapshotDeleteError Failed to delete...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: csi-driver
Version: 4.6
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Yug Gupta
QA Contact: Elad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-20 04:33 UTC by henrychi
Modified: 2023-05-27 01:36 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-18 07:25:08 UTC
Embargoed:


Attachments (Terms of Use)

Description henrychi 2021-04-20 04:33:16 UTC
Description of problem (please be detailed as possible and provide log
snippets):

After restoring from a OADP backup with a cephfs csi volume and then deleting the backup, a volumesnapshotcontent still exists.  When trying to manually delete it, it just hangs.

oc delete volumesnapshotcontents velero-velero-demo-cephfs-pvc-vpl4t-rdnbj
(hangs)

oc describe volumesnapshotcontents velero-velero-demo-cephfs-pvc-vpl4t-rdnbj

Spec:
  Deletion Policy:  Delete
  Driver:           openshift-storage.cephfs.csi.ceph.com
  Source:
    Snapshot Handle:           0001-0011-openshift-storage-0000000000000001-7594b7ad-a172-11eb-ba3e-0a580afe17a8
  Volume Snapshot Class Name:  ocs-storagecluster-cephfsplugin-snapclass-velero
  Volume Snapshot Ref:
    Kind:       VolumeSnapshot
    Name:       velero-demo-cephfs-pvc-vpl4t
    Namespace:  testns
    UID:        ce14ec3c-d8d6-4c83-a41a-f919a7d3966e
Status:
  Creation Time:    1618880071837960692
  Ready To Use:     true
  Restore Size:     0
  Snapshot Handle:  0001-0011-openshift-storage-0000000000000001-7594b7ad-a172-11eb-ba3e-0a580afe17a8
Events:
  Type     Reason               Age                    From                                                   Message
  ----     ------               ----                   ----                                                   -------
  Warning  SnapshotDeleteError  79m (x143 over 3h20m)  csi-snapshotter openshift-storage.cephfs.csi.ceph.com  Failed to delete snapshot
  Warning  SnapshotDeleteError  3m23s (x90 over 74m)   csi-snapshotter openshift-storage.cephfs.csi.ceph.com  Failed to delete snapshot


oc logs csi-cephfsplugin-provisioner-66c59d467f-ggwpd -c csi-snapshotter

I0420 01:08:31.456278       1 reflector.go:369] github.com/kubernetes-csi/external-snapshotter/client/v3/informers/externalversions/factory.go:117: forcing resync
I0420 01:08:31.456388       1 snapshot_controller_base.go:140] enqueued "velero-velero-demo-cephfs-pvc-vpl4t-rdnbj" for sync
I0420 01:08:31.456421       1 snapshot_controller_base.go:174] syncContentByKey[velero-velero-demo-cephfs-pvc-vpl4t-rdnbj]
I0420 01:08:31.456443       1 util.go:258] storeObjectUpdate updating content "velero-velero-demo-cephfs-pvc-vpl4t-rdnbj" with version 82402937
I0420 01:08:31.456456       1 snapshot_controller.go:57] synchronizing VolumeSnapshotContent[velero-velero-demo-cephfs-pvc-vpl4t-rdnbj]
I0420 01:08:31.456497       1 snapshot_controller.go:531] Check if VolumeSnapshotContent[velero-velero-demo-cephfs-pvc-vpl4t-rdnbj] should be deleted.
I0420 01:08:31.456524       1 snapshot_controller.go:60] VolumeSnapshotContent[velero-velero-demo-cephfs-pvc-vpl4t-rdnbj]: the policy is Delete
I0420 01:08:31.456532       1 snapshot_controller.go:92] Deleting snapshot for content: velero-velero-demo-cephfs-pvc-vpl4t-rdnbj
I0420 01:08:31.456537       1 snapshot_controller.go:329] deleteCSISnapshotOperation [velero-velero-demo-cephfs-pvc-vpl4t-rdnbj] started
I0420 01:08:31.456542       1 snapshot_controller.go:181] getCSISnapshotInput for content [velero-velero-demo-cephfs-pvc-vpl4t-rdnbj]
I0420 01:08:31.456546       1 snapshot_controller.go:439] getSnapshotClass: VolumeSnapshotClassName [ocs-storagecluster-cephfsplugin-snapclass-velero]
E0420 01:08:31.457834       1 snapshot_controller_base.go:261] could not sync content "velero-velero-demo-cephfs-pvc-vpl4t-rdnbj": failed to delete snapshot "velero-velero-demo-cephfs-pvc-vpl4t-rdnbj", err: failed to delete snapshot content velero-velero-demo-cephfs-pvc-vpl4t-rdnbj: "rpc error: code = InvalidArgument desc = provided secret is empty"
I0420 01:08:31.457873       1 snapshot_controller_base.go:163] Failed to sync content "velero-velero-demo-cephfs-pvc-vpl4t-rdnbj", will retry again: failed to delete snapshot "velero-velero-demo-cephfs-pvc-vpl4t-rdnbj", err: failed to delete snapshot content velero-velero-demo-cephfs-pvc-vpl4t-rdnbj: "rpc error: code = InvalidArgument desc = provided secret is empty"
I0420 01:08:31.458124       1 event.go:282] Event(v1.ObjectReference{Kind:"VolumeSnapshotContent", Namespace:"", Name:"velero-velero-demo-cephfs-pvc-vpl4t-rdnbj", UID:"8ae5a30f-f90d-4cf9-b98f-58ba895622ae", APIVersion:"snapshot.storage.k8s.io/v1beta1", ResourceVersion:"82402937", FieldPath:""}): type: 'Warning' reason: 'SnapshotDeleteError' Failed to delete snapshot


Version of all relevant components (if applicable):
OADP 0.2.0 with CSI plugin
OCP 4.6.9
OCS 4.6.4


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

If a volumesnapshotcontent cannot be deleted, it's possible that storage usage keeps increasing even though a backup is deleted.


Is there any workaround available to the best of your knowledge?

No


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?

3

Can this issue reproducible?

Yes

Can this issue reproduce from the UI?

No

If this is a regression, please provide more details to justify this:

n/a

Steps to Reproduce:

1. Create a sample application that uses ocs-storagecluster-cephfs sc
oc new-project testns
oc apply -f demo.cephfs.yaml
oc apply -f testpod.yaml

cat demo.cephfs.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: demo-cephfs-pvc
spec:
  storageClassName: ocs-storagecluster-cephfs
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 40Gi

cat testpod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: testpod
spec:
  containers:
  - command:
    - sleep
    - infinity
    image: registry.redhat.io/ubi8/ubi:latest
    imagePullPolicy: Always
    name: main
    resources: {}
    volumeMounts:
    - mountPath: /mnt
      name: cpd-data-vol
  restartPolicy: Never
  volumes:
  - name: cpd-data-vol
    persistentVolumeClaim:
      claimName: demo-cephfs-pvc


2.  Using OADP, create a backup
./velero backup create mybackup --include-namespaces testns --exclude-resources='Event,Event.events.k8s.io'


3.  Delete namespace
oc delete ns testns


4.  Using OADP, restore
./velero restore create --from-backup mybackup myrestore --exclude-resources='ImageTag'

After restore, there are 2 volumesnapshotcontents, and 1 volumesnapshot

oc get volumesnapshotcontents
NAME                                               READYTOUSE   RESTORESIZE   DELETIONPOLICY   DRIVER                                  VOLUMESNAPSHOTCLASS                                VOLUMESNAPSHOT                 AGE
snapcontent-fea465c8-5485-48ba-b3de-897bd0f1bc4c   true         42949672960   Retain           openshift-storage.cephfs.csi.ceph.com   ocs-storagecluster-cephfsplugin-snapclass-velero   velero-demo-cephfs-pvc-vpl4t   4m12s

velero-velero-demo-cephfs-pvc-vpl4t-rdnbj          true         0             Retain           openshift-storage.cephfs.csi.ceph.com   ocs-storagecluster-cephfsplugin-snapclass-velero   velero-demo-cephfs-pvc-vpl4t   32s


oc get volumesnapshot
NAME                           READYTOUSE   SOURCEPVC   SOURCESNAPSHOTCONTENT                       RESTORESIZE   SNAPSHOTCLASS                                      SNAPSHOTCONTENT                             CREATIONTIME   AGE
velero-demo-cephfs-pvc-vpl4t   true                     velero-velero-demo-cephfs-pvc-vpl4t-rdnbj   0             ocs-storagecluster-cephfsplugin-snapclass-velero   velero-velero-demo-cephfs-pvc-vpl4t-rdnbj   36s            36s

5.  Delete the backup
./velero backup delete mybackup


Actual results:

After deleting the backup, one of the volumesnapshotcontent still exists.  Trying to manually delete it, it hangs.

oc get volumesnapshotcontents
NAME                                        READYTOUSE   RESTORESIZE   DELETIONPOLICY   DRIVER                                  VOLUMESNAPSHOTCLASS                                VOLUMESNAPSHOT                 AGE
velero-velero-demo-cephfs-pvc-vpl4t-rdnbj   true         0             Delete           openshift-storage.cephfs.csi.ceph.com   ocs-storagecluster-cephfsplugin-snapclass-velero   velero-demo-cephfs-pvc-vpl4t   77s


oc delete volumesnapshotcontents velero-velero-demo-cephfs-pvc-vpl4t-rdnbj
(hangs)


Expected results:

volumesnapshotcontents associated with the backup or restore should be deleted.
At the very least, it should be possible to manually delete it.


Additional info:

Comment 2 Madhu Rajanna 2021-04-20 05:54:57 UTC
>snapcontent-fea465c8-5485-48ba-b3de-897bd0f1bc4c   true         42949672960   Retain           openshift-storage.cephfs.csi.ceph.com   ocs-storagecluster-cephfsplugin-snapclass-velero   velero-demo-cephfs-pvc-vpl4t   4m12s

Retain policy for the snapshot is not tested and maybe it is not supported in OCS.


>I0420 01:08:31.456546       1 snapshot_controller.go:439] getSnapshotClass: VolumeSnapshotClassName [ocs-storagecluster-cephfsplugin-snapclass-velero]
E0420 01:08:31.457834       1 snapshot_controller_base.go:261] could not sync content "velero-velero-demo-cephfs-pvc-vpl4t-rdnbj": failed to delete snapshot "velero-velero-demo-cephfs-pvc-vpl4t-rdnbj", err: failed to delete snapshot content velero-velero-demo-cephfs-pvc-vpl4t-rdnbj: "rpc error: code = InvalidArgument desc = provided secret is empty"

Looks like the snapshotclass is deleted before deleting the volume snapshot object (the provided secret is empty) https://bugzilla.redhat.com/show_bug.cgi?id=1893739#c7 . (this might be causing the issue to delete the snapshot)

 
I don't know how OADP behaves in the case of snapshot backup and restore. at least from the above-provided samples. it's creating a snapshot and snapshot content and deleting it.

>oc delete volumesnapshotcontents velero-velero-demo-cephfs-pvc-vpl4t-rdnbj
(hangs)

The snapshotcontent object is not meant to be deleted by the user as it's a dynamic provisioned. Maybe try to remove the finalizers and delete the volumesnapshotcontent?


@Yug can you please try to reproduce this issue and check what is missing here?

Comment 3 Humble Chirammal 2021-04-20 06:20:12 UTC
(In reply to Madhu Rajanna from comment #2)
> >snapcontent-fea465c8-5485-48ba-b3de-897bd0f1bc4c   true         42949672960   Retain           openshift-storage.cephfs.csi.ceph.com   ocs-storagecluster-cephfsplugin-snapclass-velero   velero-demo-cephfs-pvc-vpl4t   4m12s
> 
> Retain policy for the snapshot is not tested and maybe it is not supported
> in OCS.
> 
> 
> >I0420 01:08:31.456546       1 snapshot_controller.go:439] getSnapshotClass: VolumeSnapshotClassName [ocs-storagecluster-cephfsplugin-snapclass-velero]
> E0420 01:08:31.457834       1 snapshot_controller_base.go:261] could not
> sync content "velero-velero-demo-cephfs-pvc-vpl4t-rdnbj": failed to delete
> snapshot "velero-velero-demo-cephfs-pvc-vpl4t-rdnbj", err: failed to delete
> snapshot content velero-velero-demo-cephfs-pvc-vpl4t-rdnbj: "rpc error: code
> = InvalidArgument desc = provided secret is empty"
> 
> Looks like the snapshotclass is deleted before deleting the volume snapshot
> object (the provided secret is empty)
> https://bugzilla.redhat.com/show_bug.cgi?id=1893739#c7 . (this might be
> causing the issue to delete the snapshot)
> 
>  

Above is the case here. The snapshot class is not available for some reason, may be its deleted manually or at restore time, the volume snapshot class is not available.

henrychi.com, if you can list down the exact process of snapshot backup and restore wrt volumesnapshot class it would help as well.

Comment 4 henrychi 2021-04-27 01:27:29 UTC
Let me know if more info is needed.

Comment 5 Yug Gupta 2021-04-29 14:13:30 UTC
While reproducing, Deleting the backup via `./velero backup delete mybackup` does not seem to delete the volumesnapshot and volumesnapshotcontent created via velero.
Can you please share the configuration of velero instance and the backup crd used?

Comment 6 henrychi 2021-04-29 14:45:56 UTC
1) Example of  velero configuration:


cat konveyor.openshift.io_v1alpha1_velero_cr.yaml

apiVersion: konveyor.openshift.io/v1alpha1
kind: Velero
metadata:
  name: example-velero
spec:
  olm_managed: false
  default_velero_plugins:
  - aws
  - openshift
  - csi
  custom_velero_plugins:
  - name: cpdbr-velero-plugin
    image: image-registry.openshift-image-registry.svc:5000/oadp-operator/cpdbr-velero-plugin:latest
  backup_storage_locations:
  - name: default
    provider: aws
    object_storage:
      bucket: velero
    config:
      region: minio
      s_3__force_path_style: "true"
      s_3__url: http://minio-velero.apps.mycluster.ibm.com
    credentials_secret_ref:
      name: oadp-repo-secret
      namespace: oadp-operator
  enable_restic: true
  velero_resource_allocation:
    limits:
      cpu: "1"
      memory: 512Mi
    requests:
      cpu: 500m
      memory: 256Mi
  restic_resource_allocation:
    limits:
      cpu: "1"
      memory: 16Gi
    requests:
      cpu: 500m
      memory: 256Mi
  velero_image_fqin: velero/velero:v1.5.4



2) Example VolumeSnapshotClass, with deletionPolicy set to Retain

cat ocs-storagecluster-cephfsplugin-snapclass-velero.yaml


    apiVersion: snapshot.storage.k8s.io/v1beta1
    deletionPolicy: Retain
    driver: openshift-storage.cephfs.csi.ceph.com
    kind: VolumeSnapshotClass
    metadata:
     name: ocs-storagecluster-cephfsplugin-snapclass-velero
     labels:
      velero.io/csi-volumesnapshot-class: "true"
    parameters:
     clusterID: openshift-storage
     csi.storage.k8s.io/snapshotter-secret-name: rook-csi-cephfs-provisioner
     csi.storage.k8s.io/snapshotter-secret-namespace: openshift-storage



3) I didn't use a backup crd.  I just used velero command line  e.g.

./velero backup create mybackup --include-namespaces testns --exclude-resources='Event,Event.events.k8s.io'

Comment 7 Humble Chirammal 2021-04-30 14:20:30 UTC
---snip--

2) Example VolumeSnapshotClass, with deletionPolicy set to Retain

cat ocs-storagecluster-cephfsplugin-snapclass-velero.yaml


    apiVersion: snapshot.storage.k8s.io/v1beta1
    deletionPolicy: Retain
    driver: openshift-storage.cephfs.csi.ceph.com
    kind: VolumeSnapshotClass
    metadata:
     name: ocs-storagecluster-cephfsplugin-snapclass-velero
     labels:
      velero.io/csi-volumesnapshot-class: "true"
    parameters:
     clusterID: openshift-storage
     csi.storage.k8s.io/snapshotter-secret-name: rook-csi-cephfs-provisioner
     csi.storage.k8s.io/snapshotter-secret-namespace: openshift-storage

--/snip--


For a PVC which is provisioned from a SC with dynamic provisioning which has set the "reclaimPolicy" to "Retain", and if a user delete the PVC, PV object and underlying volume will remain there. It has to be "Manually deleted".  Similarly if the VolumeSnapshotClass is set "ReclaimPolicy" = "Retain" and if you delete the volume snapshot , I expect the VolumeSnapshotContent and "Ceph Volume Snapshot" in the ceph cluster to remain there. 

Isnt it the behaviour we are seeing here ?

Comment 8 Mudit Agarwal 2021-04-30 14:27:22 UTC
> 
> For a PVC which is provisioned from a SC with dynamic provisioning which has
> set the "reclaimPolicy" to "Retain", and if a user delete the PVC, PV object
> and underlying volume will remain there. It has to be "Manually deleted". 
> Similarly if the VolumeSnapshotClass is set "ReclaimPolicy" = "Retain" and
> if you delete the volume snapshot , I expect the VolumeSnapshotContent and
> "Ceph Volume Snapshot" in the ceph cluster to remain there. 
> 
> Isnt it the behaviour we are seeing here ?

But aren' they trying to delete it manually:
 
>> oc delete volumesnapshotcontents velero-velero-demo-cephfs-pvc-vpl4t-rdnbj

Comment 9 henrychi 2021-04-30 14:30:16 UTC
    I cannot even manually delete the volumensnapshotcontent.  It hangs.  From the problem description:


    1) After restore, there are 2 volumesnapshotcontents.

    oc get volumesnapshotcontents
    NAME                                               READYTOUSE   RESTORESIZE   DELETIONPOLICY   DRIVER                                  VOLUMESNAPSHOTCLASS                                VOLUMESNAPSHOT                 AGE
    snapcontent-fea465c8-5485-48ba-b3de-897bd0f1bc4c   true         42949672960   Retain           openshift-storage.cephfs.csi.ceph.com   ocs-storagecluster-cephfsplugin-snapclass-velero   velero-demo-cephfs-pvc-vpl4t   4m12s

    velero-velero-demo-cephfs-pvc-vpl4t-rdnbj          true         0             Retain           openshift-storage.cephfs.csi.ceph.com   ocs-storagecluster-cephfsplugin-snapclass-velero   velero-demo-cephfs-pvc-vpl4t   32s


    2) After deleting the velero backup, there is 1 volumesnapshotcontent.

    oc get volumesnapshotcontents
    NAME                                        READYTOUSE   RESTORESIZE   DELETIONPOLICY   DRIVER                                  VOLUMESNAPSHOTCLASS                                VOLUMESNAPSHOT                 AGE
    velero-velero-demo-cephfs-pvc-vpl4t-rdnbj   true         0             Delete           openshift-storage.cephfs.csi.ceph.com   ocs-storagecluster-cephfsplugin-snapclass-velero   velero-demo-cephfs-pvc-vpl4t   77s


    3) I cannot manually delete the 1 remaining volumensnapshotcontent.  It hangs.

    oc delete volumesnapshotcontents velero-velero-demo-cephfs-pvc-vpl4t-rdnbj
    (hangs)

Comment 10 Humble Chirammal 2021-04-30 15:33:17 UTC
Henry/Mudit, here is the confusion:

How the second Volumesnapshotcontent got created ? that said, when we take a backup, I expect 1 VolumeSnapshot and 1 VolumeSnapshotContent to be created, then  after restore, we see ONLY one VolumeSnapshot but  2 VolumeSnapshotContents, that said, was the extra VolumeSnapshotContent created statically? and if we look at the problem description we can see the original VolumeSnapshotContent (velero-velero-demo-cephfs-pvc-vpl4t-rdnbj) "restore" size is "0" , that shouldnt be the case. So, when this size got reflected/recorded? is it right after the backup creation or  after restore or some other operation? 

Also, the volume snapshot (velero-velero-demo-cephfs-pvc-vpl4t-rdnbj) referring to " ocs-storagecluster-cephfsplugin-snapclass-velero" Volume snapshot class. Is this still exist in this cluster ( as asked in c#3 and #4 which is not yet answered) ?

I am not sure what Valero do in the backend at backup and restore time, so please give these details which could help us.

Comment 11 henrychi 2021-04-30 15:35:40 UTC
^

Comment 12 henrychi 2021-04-30 15:49:35 UTC
I don't know that internals of OADP/Velero/CSI driver, so can't comment on why a second volumesnapshotcontent is created during restore.

The original volumesnapshotcontent is snapcontent-fea465c8-5485-48ba-b3de-897bd0f1bc4c, created from backup.
The second volumesnapshotcontent is velero-velero-demo-cephfs-pvc-vpl4t-rdnbj, created from restore.

I can't see comment #3.
Comment #4 is mine.

There are no volume snapshots existing in the cluster:
oc get volumesnapshot -A
No resources found

Comment 13 Humble Chirammal 2021-04-30 15:54:46 UTC
(In reply to henrychi from comment #12)
> I don't know that internals of OADP/Velero/CSI driver, so can't comment on
> why a second volumesnapshotcontent is created during restore.
> 
> The original volumesnapshotcontent is
> snapcontent-fea465c8-5485-48ba-b3de-897bd0f1bc4c, created from backup.
> The second volumesnapshotcontent is
> velero-velero-demo-cephfs-pvc-vpl4t-rdnbj, created from restore.
> 
> I can't see comment #3.
> Comment #4 is mine.
> 
> There are no volume snapshots existing in the cluster:
> oc get volumesnapshot -A
> No resources found

Just noticed those comments are marked `internal` thinking that, the internal comments are visible to you. That caused the confusion, I meant comment 2 and 3, instead of 3 and 4 though. 

Just made the visibility of comment2 and 3 to public now.

Comment 14 henrychi 2021-04-30 16:01:05 UTC
In regards to the questions about volumesnapshotclass, I manually created those before doing a backup and restore.  After initial creation, they aren't touched.  The volumesnapshotclasses still exist in my cluster.

1.  Create volumesnapshotclasses

vi ocs-storagecluster-rbdplugin-snapclass-velero.yaml


apiVersion: snapshot.storage.k8s.io/v1beta1
deletionPolicy: Retain
driver: openshift-storage.rbd.csi.ceph.com
kind: VolumeSnapshotClass
metadata:
  name: ocs-storagecluster-rbdplugin-snapclass-velero
  labels:
    velero.io/csi-volumesnapshot-class: "true"
parameters:
  clusterID: openshift-storage
  csi.storage.k8s.io/snapshotter-secret-name: rook-csi-rbd-provisioner
  csi.storage.k8s.io/snapshotter-secret-namespace: openshift-storage


vi ocs-storagecluster-cephfsplugin-snapclass-velero.yaml


    apiVersion: snapshot.storage.k8s.io/v1beta1
    deletionPolicy: Retain
    driver: openshift-storage.cephfs.csi.ceph.com
    kind: VolumeSnapshotClass
    metadata:
     name: ocs-storagecluster-cephfsplugin-snapclass-velero
     labels:
      velero.io/csi-volumesnapshot-class: "true"
    parameters:
     clusterID: openshift-storage
     csi.storage.k8s.io/snapshotter-secret-name: rook-csi-cephfs-provisioner
     csi.storage.k8s.io/snapshotter-secret-namespace: openshift-storage


2.  Backup

3.  Restore

4.  Check if volumesnapshotclasses still exist, and the answer is yes


oc get volumesnapshotclass
NAME                                               DRIVER                                  DELETIONPOLICY   AGE
ocs-storagecluster-cephfsplugin-snapclass          openshift-storage.cephfs.csi.ceph.com   Delete           97d
ocs-storagecluster-cephfsplugin-snapclass-velero   openshift-storage.cephfs.csi.ceph.com   Retain           97d
ocs-storagecluster-rbdplugin-snapclass             openshift-storage.rbd.csi.ceph.com      Delete           97d
ocs-storagecluster-rbdplugin-snapclass-velero      openshift-storage.rbd.csi.ceph.com      Retain           97d

Comment 15 Mudit Agarwal 2021-04-30 16:07:16 UTC
Thanks Henry, I think that answers the question that why we have two snapshot content. Restore will also create a new snapshot as well as snapshot content.

Two things I want to mention here:

1. This is completely related to https://bugzilla.redhat.com/show_bug.cgi?id=1952708 and we need to see why the restore size for snapshot is 0
2. And as Madhu mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1951399#c2 we don't yet support the retain policy for snapshotclass and we need to play around with it. Right now what OCS supports is the default snapshot class i.e. ocs-storagecluster-cephfsplugin-snapclass  and ocs-storagecluster-rbdplugin-snapclass

Comment 16 henrychi 2021-04-30 16:25:47 UTC
Thanks.
I just want to add that using retain policy for snapshotclass was suggested to us by some folks from OADP, and it makes sense to me.
A typical test scenario is to delete namespace and then restore.  If the policy is delete, when the volumesnapshot gets deleted, the volumesnapshotcontent gets deleted, and restore won't work.

Comment 18 Yug Gupta 2021-05-12 06:44:48 UTC
A general volumesnapshotcontent created has the following annotations that contain the secret name and the namespace

```
[ygupta@localhost cephfs]$ kubectl get volumesnapshotcontent snapcontent-6593fae7-5f12-41bd-b05f-c62d1a980ba4 -oyaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotContent
metadata:
  annotations:
    snapshot.storage.kubernetes.io/deletion-secret-name: csi-cephfs-secret
    snapshot.storage.kubernetes.io/deletion-secret-namespace: 
  creationTimestamp: "2021-05-12T05:48:23Z"
```

But on the other hand, when Madhu and I looked into the velero created volumesnapshotcontent, It doesn't seem to have the above-mentioned annotations set, and lacks the secret information.

```
[ygupta@localhost cephfs]$ kubectl get volumesnapshotcontent velero-velero-csi-cephfs-pvc-5brg9-5wjdg -oyaml
apiVersion: v1
items:
- apiVersion: snapshot.storage.k8s.io/v1
  kind: VolumeSnapshotContent
  metadata:
    annotations:
      snapshot.storage.kubernetes.io/volumesnapshot-being-deleted: "yes"
    creationTimestamp: "2021-05-11T11:49:37Z"
```

Due to lacking the necessary secret information, the Deletion of Volumesnapshotcontent created on velero restore gets stuck with error "provided secret is empty".

Based on the above-mentioned reason, it does not look like an issue with the OCS operator, but with the restore operation by velero itself, as they seem to be missing some important annotations.

Comment 19 henrychi 2021-05-12 17:09:07 UTC
The first volumensnapshotcontent that velero backed up has the annotations.  The mysterious second volumesnapshotcontent that is created during restore doesn't have the annotations.
I'm just an end user of OADP, and don't know why or how the second one is created.  Let me know if there's more info I can provide.

Is there a way to safely delete the second volumesnapshotcontent, without leaking disk space?

Comment 20 Yug Gupta 2021-05-17 05:31:01 UTC
(In reply to henrychi from comment #19)
> The first volumensnapshotcontent that velero backed up has the annotations. 
> The mysterious second volumesnapshotcontent that is created during restore
> doesn't have the annotations.
> I'm just an end user of OADP, and don't know why or how the second one is
> created.  Let me know if there's more info I can provide.
> 

Regarding any information regarding velero's internal implementation, maybe velero team can provide more insights.

> Is there a way to safely delete the second volumesnapshotcontent, without
> leaking disk space?

As mentioned earlier, the second VolumeSnapshotContent seems to be missing the required annotations to perform the deletion.

But as a workaround to delete the VolumeSnapshotContent, you can edit the Volumesnapshotcontent to add the required annotations manually,
so that the deletion can go through. This might help you to delete the VolumeSnapshotContent.

Comment 21 henrychi 2021-05-17 15:59:47 UTC
The workaround of adding the annotations manually allows the volumesnapshotcontent to be deleted.  Thanks.

Comment 22 Madhu Rajanna 2021-05-18 07:25:08 UTC
@Henrychi am closing this BZ as not a Bug from the OCS side. Please feel free to reopen if you think it's an OCS issue.


Note You need to log in before you can comment on or make changes to this bug.