1793387 – Deleting csi-cephfsplugin-provisioner pod during pod, PVC deletion leaves behind cephfs backed PV in Released state

Bug 1793387 - Deleting csi-cephfsplugin-provisioner pod during pod, PVC deletion leaves behind cephfs backed PV in Released state

Summary: Deleting csi-cephfsplugin-provisioner pod during pod, PVC deletion leaves beh...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Container Storage
Classification:	Red Hat Storage
Component:	csi-driver
Sub Component:
Version:	4.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	OCS 4.3.0
Assignee:	Shyamsundar
QA Contact:	Sidhant Agrawal
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-01-21 08:43 UTC by Sidhant Agrawal
Modified:	2020-04-14 09:45 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-04-14 09:45:28 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	ceph-csi/pull/2	None	None	None	2020-10-08 14:21:39 UTC
Github	ceph ceph-csi pull 808	None	closed	Check for ENOENT errors when deleting CephFS volumes	2020-12-02 12:22:43 UTC
Red Hat Product Errata	RHBA-2020:1437	None	None	None	2020-04-14 09:45:46 UTC

Comment 2 Yaniv Kaul 2020-01-21 10:08:25 UTC

Why would you delete our pod? What's the use case here?

Comment 4 Shyamsundar 2020-01-21 16:33:37 UTC

The errors from the csi-cephfsplugin container logs show the following:

PV: pvc-47cf7659-39a3-11ea-bf5f-028803f4190c

2020-01-18T03:42:45.687619978Z E0118 03:42:45.687569       1 volume.go:71] ID: 195 Req-ID: 0001-0011-openshift-storage-0000000000000001-4b9ded21-39a3-11ea-ad4f-0a580a830191 failed to get the rootpath for the vol csi-vol-4b9ded21-39a3-11ea-ad4f-0a580a830191(an error (exit status 2) occurred while running ceph args: [fs subvolume getpath ocs-storagecluster-cephfilesystem csi-vol-4b9ded21-39a3-11ea-ad4f-0a580a830191 --group_name csi -m 172.30.126.232:6789,172.30.213.205:6789,172.30.246.82:6789 -c /etc/ceph/ceph.conf -n client.csi-cephfs-provisioner --keyfile=***stripped***])

2020-01-18T03:42:45.687676309Z E0118 03:42:45.687649       1 utils.go:161] ID: 195 Req-ID: 0001-0011-openshift-storage-0000000000000001-4b9ded21-39a3-11ea-ad4f-0a580a830191 GRPC error: rpc error: code = Internal desc = an error (exit status 2) occurred while running ceph args: [fs subvolume getpath ocs-storagecluster-cephfilesystem csi-vol-4b9ded21-39a3-11ea-ad4f-0a580a830191 --group_name csi -m 172.30.126.232:6789,172.30.213.205:6789,172.30.246.82:6789 -c /etc/ceph/ceph.conf -n client.csi-cephfs-provisioner --keyfile=***stripped***]

(validated the same for the other 2 PVs).

The error seems to be ENOENT, or missing subvolume on CephFS.

I suspect prior to deleting the CSI OMaps the subvolume was deleted, by the older instance of the provisioner (when it was bumped/restarted). As this seems to have hit this issue: https://github.com/ceph/ceph-csi/issues/474

Could we get the previous.log for the cephfs provisioner that was restarted? That can help trace if we reached till subvolume deletion and then did not complete the transaction, and hence on a retry are failing as mentioned in the upstream issue.

Comment 18 Humble Chirammal 2020-02-11 13:38:43 UTC

The downstream PR https://github.com/openshift/ceph-csi/pull/2 is  merged on release-4.3 branch and I could see bz automatically flipped the status to MODIFIED. That means, the integration exist and also its doing its job :)

Comment 22 Michael Adam 2020-03-04 08:33:05 UTC

Fix is contained in https://ceph-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/OCS%20Build%20Pipeline%204.3/82/

Comment 26 errata-xmlrpc 2020-04-14 09:45:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1437

Note You need to log in before you can comment on or make changes to this bug.