Description of problem (please be detailed as possible and provide log snippests): If a Cephfs PVC which is having snapshot is deleted, the PV will remain in Released state. This issue is seen in external mode cluster. Error from ocs-ci test case tests/manage/pv_services/pvc_snapshot/test_snapshot_at_different_pvc_utlilization_level.py::TestSnapshotAtDifferentPvcUsageLevel::test_snapshot_at_different_usage_level: E TimeoutError: Timeout when waiting for pvc-22488e5a-19b2-407f-9795-5e45af4146e8 to delete. Describe output: Name: pvc-22488e5a-19b2-407f-9795-5e45af4146e8 E Labels: <none> E Annotations: pv.kubernetes.io/provisioned-by: openshift-storage.cephfs.csi.ceph.com E Finalizers: [kubernetes.io/pv-protection] E StorageClass: ocs-external-storagecluster-cephfs E Status: Released E Claim: namespace-test-89e58b4549474bfda5a12d6ff7fd1673/pvc-test-e6d44eeac21d49abbcd4c3118d6d2959 E Reclaim Policy: Delete E Access Modes: RWO E VolumeMode: Filesystem E Capacity: 10Gi E Node Affinity: <none> E Message: E Source: E Type: CSI (a Container Storage Interface (CSI) volume source) E Driver: openshift-storage.cephfs.csi.ceph.com E FSType: E VolumeHandle: 0001-0011-openshift-storage-0000000000000002-9bdd8109-19dc-11eb-a719-0a580a800226 E ReadOnly: false E VolumeAttributes: clusterID=openshift-storage E fsName=cephfs E pool=cephfs_data E storage.kubernetes.io/csiProvisionerIdentity=1603966438499-8081-openshift-storage.cephfs.csi.ceph.com E subvolumeName=csi-vol-9bdd8109-19dc-11eb-a719-0a580a800226 E Events: E Type Reason Age From Message E ---- ------ ---- ---- ------- E Warning VolumeFailedDelete 3m2s openshift-storage.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-98d99f679-jpg49_bdabca3e-2906-47fa-bafb-7c20b5915c9a persistentvolume pvc-22488e5a-19b2-407f-9795-5e45af4146e8 is still attached to node compute-1 E Warning VolumeFailedDelete 42s (x8 over 3m) openshift-storage.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-98d99f679-jpg49_bdabca3e-2906-47fa-bafb-7c20b5915c9a rpc error: code = Internal desc = an error (exit status 39) occurred while running ceph args: [fs subvolume rm cephfs csi-vol-9bdd8109-19dc-11eb-a719-0a580a800226 --group_name csi -m 10.1.8.45:6789,10.1.8.49:6789,10.1.8.62:6789 -c /etc/ceph/ceph.conf -n client.csi-cephfs-provisioner --keyfile=***stripped***] This error is repeated in csi-cephfsplugin-provisioner-98d99f679-jpg49 pod csi-provisioner container logs: 2020-10-29T12:00:20.328999155Z I1029 12:00:20.328942 1 controller.go:1453] delete "pvc-22488e5a-19b2-407f-9795-5e45af4146e8": started 2020-10-29T12:00:20.394009818Z I1029 12:00:20.393963 1 connection.go:182] GRPC call: /csi.v1.Controller/DeleteVolume 2020-10-29T12:00:20.394063940Z I1029 12:00:20.393988 1 connection.go:183] GRPC request: {"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000002-9bdd8109-19dc-11eb-a719-0a580a800226"} 2020-10-29T12:00:21.875669355Z I1029 12:00:21.875584 1 connection.go:185] GRPC response: {} 2020-10-29T12:00:21.875669355Z I1029 12:00:21.875634 1 connection.go:186] GRPC error: rpc error: code = Internal desc = an error (exit status 39) occurred while running ceph args: [fs subvolume rm cephfs csi-vol-9bdd8109-19dc-11eb-a719-0a580a800226 --group_name csi -m 10.1.8.45:6789,10.1.8.49:6789,10.1.8.62:6789 -c /etc/ceph/ceph.conf -n client.csi-cephfs-provisioner --keyfile=***stripped***] 2020-10-29T12:00:21.875702878Z E1029 12:00:21.875665 1 controller.go:1463] delete "pvc-22488e5a-19b2-407f-9795-5e45af4146e8": volume deletion failed: rpc error: code = Internal desc = an error (exit status 39) occurred while running ceph args: [fs subvolume rm cephfs csi-vol-9bdd8109-19dc-11eb-a719-0a580a800226 --group_name csi -m 10.1.8.45:6789,10.1.8.49:6789,10.1.8.62:6789 -c /etc/ceph/ceph.conf -n client.csi-cephfs-provisioner --keyfile=***stripped***] 2020-10-29T12:00:21.875710250Z W1029 12:00:21.875700 1 controller.go:998] Retrying syncing volume "pvc-22488e5a-19b2-407f-9795-5e45af4146e8", failure 9 2020-10-29T12:00:21.875744325Z E1029 12:00:21.875728 1 controller.go:1016] error syncing volume "pvc-22488e5a-19b2-407f-9795-5e45af4146e8": rpc error: code = Internal desc = an error (exit status 39) occurred while running ceph args: [fs subvolume rm cephfs csi-vol-9bdd8109-19dc-11eb-a719-0a580a800226 --group_name csi -m 10.1.8.45:6789,10.1.8.49:6789,10.1.8.62:6789 -c /etc/ceph/ceph.conf -n client.csi-cephfs-provisioner --keyfile=***stripped***] 2020-10-29T12:00:21.876230484Z I1029 12:00:21.876144 1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolume", Namespace:"", Name:"pvc-22488e5a-19b2-407f-9795-5e45af4146e8", UID:"24e20902-f93a-4852-9202-f6598f955472", APIVersion:"v1", ResourceVersion:"168204", FieldPath:""}): type: 'Warning' reason: 'VolumeFailedDelete' rpc error: code = Internal desc = an error (exit status 39) occurred while running ceph args: [fs subvolume rm cephfs csi-vol-9bdd8109-19dc-11eb-a719-0a580a800226 --group_name csi -m 10.1.8.45:6789,10.1.8.49:6789,10.1.8.62:6789 -c /etc/ceph/ceph.conf -n client.csi-cephfs-provisioner --keyfile=***stripped***] Must-gather logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/rgeorge-oct29/rgeorge-oct29_20201029T075732/logs/failed_testcase_ocs_logs_1603971117/test_snapshot_at_different_usage_level_ocs_logs/ Version of all relevant components (if applicable): OCS 4.6.0-148 OCP 4.6.0-0.nightly-2020-10-28-101609 $ ceph version ceph version 14.2.8-91.el8cp (75b4845da7d469665bd48d1a49badcc3677bf5cd) nautilus (stable) Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? No Is there any workaround available to the best of your knowledge? Yes, delete the PV manually. Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? Yes Can this issue reproduce from the UI? Yes If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Create a Cephfs PVC test-pvc. 2. Create a snapshot of the PVC test-pvc 3. Delete the PVC test-pvc. 4. Check whether the PV is deleted. Or Run ocs-ci test case tests/manage/pv_services/pvc_snapshot/test_snapshot_at_different_pvc_utlilization_level.py::TestSnapshotAtDifferentPvcUsageLevel::test_snapshot_at_different_usage_level This test case will perform parent PVC deletion along with some other steps. Actual results: PV is not deleted. It will remain in Released state. Expected results: PV should delete because the reclaimPolicy is Delete Additional info:
Thanks for looking into this Madhu. For internal cluster we are using 14.2.8-111.el8cp with OCS 4.6 Not sure where we document the minimum ceph version for a particular OCS release, either we need to update that document or convert this BZ to documentation for recording the same. Eran, snapshot/clone feature for cephFS depends on certain ceph features which were released as part of Ceph 4.1z2. We need to make sure that external cluster should also run on the same ceph version while working with OCS 4.6
Changing the component beased on https://bugzilla.redhat.com/show_bug.cgi?id=1892819#c3
We document here what is the minimal RHCS version that we need. https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/4.5/html-single/planning_your_deployment/index#external-mode-requirements_rhocs
(In reply to Eran Tamir from comment #5) > We document here what is the minimal RHCS version that we need. > https://access.redhat.com/documentation/en-us/ > red_hat_openshift_container_storage/4.5/html-single/planning_your_deployment/ > index#external-mode-requirements_rhocs @etamir as of now in OCS 4.5 docs, the minimal supported version is specified as "Red Hat Ceph Storage version 4.1.1 " . Should we change the deployment guide and planning guide to mention "Red Hat Ceph Storage version 4.1.2 "
(In reply to Neha Berry from comment #6) > (In reply to Eran Tamir from comment #5) > > We document here what is the minimal RHCS version that we need. > > https://access.redhat.com/documentation/en-us/ > > red_hat_openshift_container_storage/4.5/html-single/planning_your_deployment/ > > index#external-mode-requirements_rhocs > > @etamir as of now in OCS 4.5 docs, the minimal supported version > is specified as "Red Hat Ceph Storage version 4.1.1 " . Should we change the > deployment guide and planning guide to mention "Red Hat Ceph Storage version > 4.1.2 " Yes, we need fixes in RHCS4.1z2 so that snap/clone feature in OCS4.6 can work.
>> If already released, we just need to mention 4.1.2 in docs of 4.6 as minimum version for a freshly deployed cluster ? right ? Yes, AFAIK it is released hence we should mention 4.1.2 >> do we need a doc text for Known issue/Notable fix for this bug? This should not come in the category of known issue because we are already asking customers to work with the documented RHCS version. Customer is not supposed to work with any other version.