Bug 1793387
Summary: | Deleting csi-cephfsplugin-provisioner pod during pod, PVC deletion leaves behind cephfs backed PV in Released state | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat OpenShift Container Storage | Reporter: | Sidhant Agrawal <sagrawal> |
Component: | csi-driver | Assignee: | Shyamsundar <srangana> |
Status: | CLOSED ERRATA | QA Contact: | Sidhant Agrawal <sagrawal> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.2 | CC: | assingh, ebenahar, hchiramm, jcollin, madam, ocs-bugs, srangana |
Target Milestone: | --- | Keywords: | Automation |
Target Release: | OCS 4.3.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-04-14 09:45:28 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Comment 2
Yaniv Kaul
2020-01-21 10:08:25 UTC
The errors from the csi-cephfsplugin container logs show the following: PV: pvc-47cf7659-39a3-11ea-bf5f-028803f4190c 2020-01-18T03:42:45.687619978Z E0118 03:42:45.687569 1 volume.go:71] ID: 195 Req-ID: 0001-0011-openshift-storage-0000000000000001-4b9ded21-39a3-11ea-ad4f-0a580a830191 failed to get the rootpath for the vol csi-vol-4b9ded21-39a3-11ea-ad4f-0a580a830191(an error (exit status 2) occurred while running ceph args: [fs subvolume getpath ocs-storagecluster-cephfilesystem csi-vol-4b9ded21-39a3-11ea-ad4f-0a580a830191 --group_name csi -m 172.30.126.232:6789,172.30.213.205:6789,172.30.246.82:6789 -c /etc/ceph/ceph.conf -n client.csi-cephfs-provisioner --keyfile=***stripped***]) 2020-01-18T03:42:45.687676309Z E0118 03:42:45.687649 1 utils.go:161] ID: 195 Req-ID: 0001-0011-openshift-storage-0000000000000001-4b9ded21-39a3-11ea-ad4f-0a580a830191 GRPC error: rpc error: code = Internal desc = an error (exit status 2) occurred while running ceph args: [fs subvolume getpath ocs-storagecluster-cephfilesystem csi-vol-4b9ded21-39a3-11ea-ad4f-0a580a830191 --group_name csi -m 172.30.126.232:6789,172.30.213.205:6789,172.30.246.82:6789 -c /etc/ceph/ceph.conf -n client.csi-cephfs-provisioner --keyfile=***stripped***] (validated the same for the other 2 PVs). The error seems to be ENOENT, or missing subvolume on CephFS. I suspect prior to deleting the CSI OMaps the subvolume was deleted, by the older instance of the provisioner (when it was bumped/restarted). As this seems to have hit this issue: https://github.com/ceph/ceph-csi/issues/474 Could we get the previous.log for the cephfs provisioner that was restarted? That can help trace if we reached till subvolume deletion and then did not complete the transaction, and hence on a retry are failing as mentioned in the upstream issue. The downstream PR https://github.com/openshift/ceph-csi/pull/2 is merged on release-4.3 branch and I could see bz automatically flipped the status to MODIFIED. That means, the integration exist and also its doing its job :) Fix is contained in https://ceph-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/OCS%20Build%20Pipeline%204.3/82/ Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1437 |