Bug 1793387

Summary:	Deleting csi-cephfsplugin-provisioner pod during pod, PVC deletion leaves behind cephfs backed PV in Released state
Product:	[Red Hat Storage] Red Hat OpenShift Container Storage	Reporter:	Sidhant Agrawal <sagrawal>
Component:	csi-driver	Assignee:	Shyamsundar <srangana>
Status:	CLOSED ERRATA	QA Contact:	Sidhant Agrawal <sagrawal>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	4.2	CC:	assingh, ebenahar, hchiramm, jcollin, madam, ocs-bugs, srangana
Target Milestone:	---	Keywords:	Automation
Target Release:	OCS 4.3.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-04-14 09:45:28 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Comment 2 Yaniv Kaul 2020-01-21 10:08:25 UTC

Why would you delete our pod? What's the use case here?

Comment 4 Shyamsundar 2020-01-21 16:33:37 UTC

The errors from the csi-cephfsplugin container logs show the following:

PV: pvc-47cf7659-39a3-11ea-bf5f-028803f4190c

2020-01-18T03:42:45.687619978Z E0118 03:42:45.687569       1 volume.go:71] ID: 195 Req-ID: 0001-0011-openshift-storage-0000000000000001-4b9ded21-39a3-11ea-ad4f-0a580a830191 failed to get the rootpath for the vol csi-vol-4b9ded21-39a3-11ea-ad4f-0a580a830191(an error (exit status 2) occurred while running ceph args: [fs subvolume getpath ocs-storagecluster-cephfilesystem csi-vol-4b9ded21-39a3-11ea-ad4f-0a580a830191 --group_name csi -m 172.30.126.232:6789,172.30.213.205:6789,172.30.246.82:6789 -c /etc/ceph/ceph.conf -n client.csi-cephfs-provisioner --keyfile=***stripped***])

2020-01-18T03:42:45.687676309Z E0118 03:42:45.687649       1 utils.go:161] ID: 195 Req-ID: 0001-0011-openshift-storage-0000000000000001-4b9ded21-39a3-11ea-ad4f-0a580a830191 GRPC error: rpc error: code = Internal desc = an error (exit status 2) occurred while running ceph args: [fs subvolume getpath ocs-storagecluster-cephfilesystem csi-vol-4b9ded21-39a3-11ea-ad4f-0a580a830191 --group_name csi -m 172.30.126.232:6789,172.30.213.205:6789,172.30.246.82:6789 -c /etc/ceph/ceph.conf -n client.csi-cephfs-provisioner --keyfile=***stripped***]

(validated the same for the other 2 PVs).

The error seems to be ENOENT, or missing subvolume on CephFS.

I suspect prior to deleting the CSI OMaps the subvolume was deleted, by the older instance of the provisioner (when it was bumped/restarted). As this seems to have hit this issue: https://github.com/ceph/ceph-csi/issues/474

Could we get the previous.log for the cephfs provisioner that was restarted? That can help trace if we reached till subvolume deletion and then did not complete the transaction, and hence on a retry are failing as mentioned in the upstream issue.

Comment 18 Humble Chirammal 2020-02-11 13:38:43 UTC

The downstream PR https://github.com/openshift/ceph-csi/pull/2 is  merged on release-4.3 branch and I could see bz automatically flipped the status to MODIFIED. That means, the integration exist and also its doing its job :)

Comment 22 Michael Adam 2020-03-04 08:33:05 UTC

Fix is contained in https://ceph-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/OCS%20Build%20Pipeline%204.3/82/

Comment 26 errata-xmlrpc 2020-04-14 09:45:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1437