Bug 1892819 - [External mode] Deleting Cephfs PVC which is having snapshot will leave PV in Released state
Summary: [External mode] Deleting Cephfs PVC which is having snapshot will leave PV in...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: documentation
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Olive Lakra
QA Contact: Jilju Joy
URL:
Whiteboard:
Depends On:
Blocks: 1882363
TreeView+ depends on / blocked
 
Reported: 2020-10-29 17:55 UTC by Jilju Joy
Modified: 2021-08-25 14:55 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-08-25 14:55:00 UTC
Embargoed:


Attachments (Terms of Use)

Description Jilju Joy 2020-10-29 17:55:30 UTC
Description of problem (please be detailed as possible and provide log
snippests):
If a Cephfs PVC which is having snapshot is deleted, the PV will remain in Released state. This issue is seen in external mode cluster.

Error from ocs-ci test case tests/manage/pv_services/pvc_snapshot/test_snapshot_at_different_pvc_utlilization_level.py::TestSnapshotAtDifferentPvcUsageLevel::test_snapshot_at_different_usage_level:

E               TimeoutError: Timeout when waiting for pvc-22488e5a-19b2-407f-9795-5e45af4146e8 to delete. Describe output: Name:            pvc-22488e5a-19b2-407f-9795-5e45af4146e8
E               Labels:          <none>
E               Annotations:     pv.kubernetes.io/provisioned-by: openshift-storage.cephfs.csi.ceph.com
E               Finalizers:      [kubernetes.io/pv-protection]
E               StorageClass:    ocs-external-storagecluster-cephfs
E               Status:          Released
E               Claim:           namespace-test-89e58b4549474bfda5a12d6ff7fd1673/pvc-test-e6d44eeac21d49abbcd4c3118d6d2959
E               Reclaim Policy:  Delete
E               Access Modes:    RWO
E               VolumeMode:      Filesystem
E               Capacity:        10Gi
E               Node Affinity:   <none>
E               Message:         
E               Source:
E                   Type:              CSI (a Container Storage Interface (CSI) volume source)
E                   Driver:            openshift-storage.cephfs.csi.ceph.com
E                   FSType:            
E                   VolumeHandle:      0001-0011-openshift-storage-0000000000000002-9bdd8109-19dc-11eb-a719-0a580a800226
E                   ReadOnly:          false
E                   VolumeAttributes:      clusterID=openshift-storage
E                                          fsName=cephfs
E                                          pool=cephfs_data
E                                          storage.kubernetes.io/csiProvisionerIdentity=1603966438499-8081-openshift-storage.cephfs.csi.ceph.com
E                                          subvolumeName=csi-vol-9bdd8109-19dc-11eb-a719-0a580a800226
E               Events:
E                 Type     Reason              Age               From                                                                                                                     Message
E                 ----     ------              ----              ----                                                                                                                     -------
E                 Warning  VolumeFailedDelete  3m2s              openshift-storage.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-98d99f679-jpg49_bdabca3e-2906-47fa-bafb-7c20b5915c9a  persistentvolume pvc-22488e5a-19b2-407f-9795-5e45af4146e8 is still attached to node compute-1
E                 Warning  VolumeFailedDelete  42s (x8 over 3m)  openshift-storage.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-98d99f679-jpg49_bdabca3e-2906-47fa-bafb-7c20b5915c9a  rpc error: code = Internal desc = an error (exit status 39) occurred while running ceph args: [fs subvolume rm cephfs csi-vol-9bdd8109-19dc-11eb-a719-0a580a800226 --group_name csi -m 10.1.8.45:6789,10.1.8.49:6789,10.1.8.62:6789 -c /etc/ceph/ceph.conf -n client.csi-cephfs-provisioner --keyfile=***stripped***]




This error is repeated in csi-cephfsplugin-provisioner-98d99f679-jpg49 pod csi-provisioner container logs:


2020-10-29T12:00:20.328999155Z I1029 12:00:20.328942       1 controller.go:1453] delete "pvc-22488e5a-19b2-407f-9795-5e45af4146e8": started
2020-10-29T12:00:20.394009818Z I1029 12:00:20.393963       1 connection.go:182] GRPC call: /csi.v1.Controller/DeleteVolume
2020-10-29T12:00:20.394063940Z I1029 12:00:20.393988       1 connection.go:183] GRPC request: {"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000002-9bdd8109-19dc-11eb-a719-0a580a800226"}
2020-10-29T12:00:21.875669355Z I1029 12:00:21.875584       1 connection.go:185] GRPC response: {}
2020-10-29T12:00:21.875669355Z I1029 12:00:21.875634       1 connection.go:186] GRPC error: rpc error: code = Internal desc = an error (exit status 39) occurred while running ceph args: [fs subvolume rm cephfs csi-vol-9bdd8109-19dc-11eb-a719-0a580a800226 --group_name csi -m 10.1.8.45:6789,10.1.8.49:6789,10.1.8.62:6789 -c /etc/ceph/ceph.conf -n client.csi-cephfs-provisioner --keyfile=***stripped***]
2020-10-29T12:00:21.875702878Z E1029 12:00:21.875665       1 controller.go:1463] delete "pvc-22488e5a-19b2-407f-9795-5e45af4146e8": volume deletion failed: rpc error: code = Internal desc = an error (exit status 39) occurred while running ceph args: [fs subvolume rm cephfs csi-vol-9bdd8109-19dc-11eb-a719-0a580a800226 --group_name csi -m 10.1.8.45:6789,10.1.8.49:6789,10.1.8.62:6789 -c /etc/ceph/ceph.conf -n client.csi-cephfs-provisioner --keyfile=***stripped***]
2020-10-29T12:00:21.875710250Z W1029 12:00:21.875700       1 controller.go:998] Retrying syncing volume "pvc-22488e5a-19b2-407f-9795-5e45af4146e8", failure 9
2020-10-29T12:00:21.875744325Z E1029 12:00:21.875728       1 controller.go:1016] error syncing volume "pvc-22488e5a-19b2-407f-9795-5e45af4146e8": rpc error: code = Internal desc = an error (exit status 39) occurred while running ceph args: [fs subvolume rm cephfs csi-vol-9bdd8109-19dc-11eb-a719-0a580a800226 --group_name csi -m 10.1.8.45:6789,10.1.8.49:6789,10.1.8.62:6789 -c /etc/ceph/ceph.conf -n client.csi-cephfs-provisioner --keyfile=***stripped***]
2020-10-29T12:00:21.876230484Z I1029 12:00:21.876144       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolume", Namespace:"", Name:"pvc-22488e5a-19b2-407f-9795-5e45af4146e8", UID:"24e20902-f93a-4852-9202-f6598f955472", APIVersion:"v1", ResourceVersion:"168204", FieldPath:""}): type: 'Warning' reason: 'VolumeFailedDelete' rpc error: code = Internal desc = an error (exit status 39) occurred while running ceph args: [fs subvolume rm cephfs csi-vol-9bdd8109-19dc-11eb-a719-0a580a800226 --group_name csi -m 10.1.8.45:6789,10.1.8.49:6789,10.1.8.62:6789 -c /etc/ceph/ceph.conf -n client.csi-cephfs-provisioner --keyfile=***stripped***]




Must-gather logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/rgeorge-oct29/rgeorge-oct29_20201029T075732/logs/failed_testcase_ocs_logs_1603971117/test_snapshot_at_different_usage_level_ocs_logs/




Version of all relevant components (if applicable):
OCS 4.6.0-148
OCP 4.6.0-0.nightly-2020-10-28-101609

$ ceph version
ceph version 14.2.8-91.el8cp (75b4845da7d469665bd48d1a49badcc3677bf5cd) nautilus (stable)


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
No

Is there any workaround available to the best of your knowledge?
Yes, delete the PV manually.

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
Yes


Can this issue reproduce from the UI?
Yes


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Create a Cephfs PVC test-pvc.
2. Create a snapshot of the PVC test-pvc
3. Delete the PVC test-pvc.
4. Check whether the PV is deleted.

Or
Run ocs-ci test case tests/manage/pv_services/pvc_snapshot/test_snapshot_at_different_pvc_utlilization_level.py::TestSnapshotAtDifferentPvcUsageLevel::test_snapshot_at_different_usage_level

This test case will perform parent PVC deletion along with some other steps.


Actual results:
PV is not deleted. It will remain in Released state.


Expected results:
PV should delete because the reclaimPolicy is Delete


Additional info:

Comment 3 Mudit Agarwal 2020-10-30 07:16:16 UTC
Thanks for looking into this Madhu. For internal cluster we are using 14.2.8-111.el8cp with OCS 4.6
Not sure where we document the minimum ceph version for a particular OCS release, either we need to update that document or convert this BZ to documentation for recording the same.

Eran, snapshot/clone feature for cephFS depends on certain ceph features which were released as part of Ceph 4.1z2. We need to make sure that external cluster should also run on the same ceph version while working with OCS 4.6

Comment 4 Mudit Agarwal 2020-11-02 04:02:33 UTC
Changing the component beased on https://bugzilla.redhat.com/show_bug.cgi?id=1892819#c3

Comment 6 Neha Berry 2020-11-09 09:38:51 UTC
(In reply to Eran Tamir from comment #5)
> We document here what is the minimal RHCS version that we need. 
> https://access.redhat.com/documentation/en-us/
> red_hat_openshift_container_storage/4.5/html-single/planning_your_deployment/
> index#external-mode-requirements_rhocs

@etamir as of now in OCS 4.5 docs, the minimal supported version is specified as "Red Hat Ceph Storage version 4.1.1 " . Should we change the deployment guide and planning guide to mention "Red Hat Ceph Storage version 4.1.2 "

Comment 8 Mudit Agarwal 2020-11-09 13:01:27 UTC
(In reply to Neha Berry from comment #6)
> (In reply to Eran Tamir from comment #5)
> > We document here what is the minimal RHCS version that we need. 
> > https://access.redhat.com/documentation/en-us/
> > red_hat_openshift_container_storage/4.5/html-single/planning_your_deployment/
> > index#external-mode-requirements_rhocs
> 
> @etamir as of now in OCS 4.5 docs, the minimal supported version
> is specified as "Red Hat Ceph Storage version 4.1.1 " . Should we change the
> deployment guide and planning guide to mention "Red Hat Ceph Storage version
> 4.1.2 "

Yes, we need fixes in RHCS4.1z2 so that snap/clone feature in OCS4.6 can work.

Comment 12 Mudit Agarwal 2020-11-10 13:44:57 UTC
>> If already released, we just need to mention 4.1.2 in docs of 4.6 as minimum version for a freshly deployed cluster ? right ?
Yes, AFAIK it is released hence we should mention 4.1.2

>> do we need a doc text for Known issue/Notable fix for this bug?
This should not come in the category of known issue because we are already asking customers to work with the documented RHCS version. Customer is not supposed to work with any other version.


Note You need to log in before you can comment on or make changes to this bug.