Bug 1793387 - Deleting csi-cephfsplugin-provisioner pod during pod, PVC deletion leaves behind cephfs backed PV in Released state
Summary: Deleting csi-cephfsplugin-provisioner pod during pod, PVC deletion leaves beh...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: csi-driver
Version: 4.2
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: OCS 4.3.0
Assignee: Shyamsundar
QA Contact: Sidhant Agrawal
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-21 08:43 UTC by Sidhant Agrawal
Modified: 2020-04-14 09:45 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-04-14 09:45:28 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph-csi/pull/2 0 None None None 2020-10-08 14:21:39 UTC
Github ceph ceph-csi pull 808 0 None closed Check for ENOENT errors when deleting CephFS volumes 2020-12-02 12:22:43 UTC
Red Hat Product Errata RHBA-2020:1437 0 None None None 2020-04-14 09:45:46 UTC

Comment 2 Yaniv Kaul 2020-01-21 10:08:25 UTC
Why would you delete our pod? What's the use case here?

Comment 4 Shyamsundar 2020-01-21 16:33:37 UTC
The errors from the csi-cephfsplugin container logs show the following:

PV: pvc-47cf7659-39a3-11ea-bf5f-028803f4190c

2020-01-18T03:42:45.687619978Z E0118 03:42:45.687569       1 volume.go:71] ID: 195 Req-ID: 0001-0011-openshift-storage-0000000000000001-4b9ded21-39a3-11ea-ad4f-0a580a830191 failed to get the rootpath for the vol csi-vol-4b9ded21-39a3-11ea-ad4f-0a580a830191(an error (exit status 2) occurred while running ceph args: [fs subvolume getpath ocs-storagecluster-cephfilesystem csi-vol-4b9ded21-39a3-11ea-ad4f-0a580a830191 --group_name csi -m 172.30.126.232:6789,172.30.213.205:6789,172.30.246.82:6789 -c /etc/ceph/ceph.conf -n client.csi-cephfs-provisioner --keyfile=***stripped***])

2020-01-18T03:42:45.687676309Z E0118 03:42:45.687649       1 utils.go:161] ID: 195 Req-ID: 0001-0011-openshift-storage-0000000000000001-4b9ded21-39a3-11ea-ad4f-0a580a830191 GRPC error: rpc error: code = Internal desc = an error (exit status 2) occurred while running ceph args: [fs subvolume getpath ocs-storagecluster-cephfilesystem csi-vol-4b9ded21-39a3-11ea-ad4f-0a580a830191 --group_name csi -m 172.30.126.232:6789,172.30.213.205:6789,172.30.246.82:6789 -c /etc/ceph/ceph.conf -n client.csi-cephfs-provisioner --keyfile=***stripped***]

(validated the same for the other 2 PVs).

The error seems to be ENOENT, or missing subvolume on CephFS.

I suspect prior to deleting the CSI OMaps the subvolume was deleted, by the older instance of the provisioner (when it was bumped/restarted). As this seems to have hit this issue: https://github.com/ceph/ceph-csi/issues/474

Could we get the previous.log for the cephfs provisioner that was restarted? That can help trace if we reached till subvolume deletion and then did not complete the transaction, and hence on a retry are failing as mentioned in the upstream issue.

Comment 18 Humble Chirammal 2020-02-11 13:38:43 UTC
The downstream PR https://github.com/openshift/ceph-csi/pull/2 is  merged on release-4.3 branch and I could see bz automatically flipped the status to MODIFIED. That means, the integration exist and also its doing its job :)

Comment 26 errata-xmlrpc 2020-04-14 09:45:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1437


Note You need to log in before you can comment on or make changes to this bug.