Bug 2155507

Summary:	RBD reclaimspace job fails when the PVC is not mounted
Product:	[Red Hat Storage] Red Hat OpenShift Data Foundation	Reporter:	Rachael <rgeorge>
Component:	csi-driver	Assignee:	Rakshith <rar>
Status:	CLOSED ERRATA	QA Contact:	kmanohar
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	4.12	CC:	ebenahar, hnallurv, kramdoss, ocs-bugs, odf-bz-bot, rar
Target Milestone:	---
Target Release:	ODF 4.13.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	4.13.0-93	Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-06-21 15:22:55 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Rachael 2022-12-21 11:36:42 UTC

Description of problem (please be detailed as possible and provide log
snippets):

RBD reclaimspace job fails, when the PVC is not mounted, with the following error:

Failed to make node request: failed to execute "fstrim" on "/var/lib/kubelet/plugins/kubernetes.io/csi/openshift-storage.rbd.csi.ceph.com/8fa61375d1eb83be19dd45b7e9dde5b67e637171effe8d8defbaa0796d04c350/globalmount/0001-0011-openshift-storage-0000000000000017-7db30de2-71c5-46cc-ab2e-b5a0c8c3d4e7" (an error (exit status 1) occurred while running fstrim args: [/var/lib/kubelet/plugins/kubernetes.io/csi/openshift-storage.rbd.csi.ceph.com/8fa61375d1eb83be19dd45b7e9dde5b67e637171effe8d8defbaa0796d04c350/globalmount/0001-0011-openshift-storage-0000000000000017-7db30de2-71c5-46cc-ab2e-b5a0c8c3d4e7]): fstrim: cannot open /var/lib/kubelet/plugins/kubernetes.io/csi/openshift-storage.rbd.csi.ceph.com/8fa61375d1eb83be19dd45b7e9dde5b67e637171effe8d8defbaa0796d04c350/globalmount/0001-0011-openshift-storage-0000000000000017-7db30de2-71c5-46cc-ab2e-b5a0c8c3d4e7: No such file or directory


Version of all relevant components (if applicable):
---------------------------------------------------
OCP: 4.12.0-0.nightly-2022-12-01-184212
ODF: 4.12.0-122


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:
Yes, the test passed in ODF 4.12.0-91

Steps to Reproduce:
-------------------
1. Create and attach RBD PVC of size 25 GiB to an app pod.
2. Get the used size of the RBD pool
3. Create a file of size 10GiB
4. Delete the file
5. Delete the pod
6. Create ReclaimSpaceJob
7. No errors should be seen in reclaim space job

OCS-CI test: https://github.com/red-hat-storage/ocs-ci/blob/master/tests/manage/pv_services/space_reclaim/test_rbd_space_reclaim.py#L199


Actual results:
---------------
RBD reclaimspace job fails when the PVC is not mounted


Expected results:
-----------------
RBD reclaimspace job should succeed

Comment 4 Humble Chirammal 2022-12-21 12:01:58 UTC

Racheal, as an initial analysis, I am wondering, this is really a regression? because while you deleted the pod, the global mount path (Staging) path was supposed to be taken out and a job on that path expected to fail. This should have worked the same way in previous release unless, it was that fast the reclaimspace job hit the staging path ( I mean before the complete unmount) . Are we sure this is a regression?

Comment 8 Humble Chirammal 2022-12-22 07:39:56 UTC

Rakshith, thinking some more on this, I feel there is a room for small enhancement. That said, if the path does not exist ( "no such file or directory") while we really attempted failure looks to be the correct action , but if we notice the VA object with deletion timestamp/finalizer set,  can we log that and move away from triggering the job? is it already handled ?

Comment 24 kmanohar 2023-05-22 11:01:44 UTC

VERIFICATION COMMENTS:

Problem Description


RBD reclaimspace job fails, when the PVC is not mounted, with the following error:

Failed to make node request: failed to execute "fstrim" on "/var/lib/kubelet/plugins/kubernetes.io/csi/openshift-storage.rbd.csi.ceph.com/8fa61375d1eb83be19dd45b7e9dde5b67e637171effe8d8defbaa0796d04c350/globalmount/0001-0011-openshift-storage-0000000000000017-7db30de2-71c5-46cc-ab2e-b5a0c8c3d4e7" (an error (exit status 1) occurred while running fstrim args: [/var/lib/kubelet/plugins/kubernetes.io/csi/openshift-storage.rbd.csi.ceph.com/8fa61375d1eb83be19dd45b7e9dde5b67e637171effe8d8defbaa0796d04c350/globalmount/0001-0011-openshift-storage-0000000000000017-7db30de2-71c5-46cc-ab2e-b5a0c8c3d4e7]): fstrim: cannot open /var/lib/kubelet/plugins/kubernetes.io/csi/openshift-storage.rbd.csi.ceph.com/8fa61375d1eb83be19dd45b7e9dde5b67e637171effe8d8defbaa0796d04c350/globalmount/0001-0011-openshift-storage-0000000000000017-7db30de2-71c5-46cc-ab2e-b5a0c8c3d4e7: No such file or directory

Steps to Reproduce:
-------------------
1. Create and attach RBD PVC of size 25 GiB to an app pod.
2. Get the used size of the RBD pool
3. Create a file of size 10GiB
4. Delete the file
5. Delete the pod
6. Create ReclaimSpaceJob
7. No errors should be seen in reclaim space job

Actual results:
---------------
RBD reclaimspace job fails when the PVC is not mounted


Expected results:
-----------------
RBD reclaimspace job should succeed

_________________________________________________________________

Verified on - 4.13.0-201

ReclaimSpaceJob yaml output:-

{'apiVersion': 'csiaddons.openshift.io/v1alpha1',
 'kind': 'ReclaimSpaceJob',
 'metadata': {'creationTimestamp': '2023-05-22T05:38:21Z',
              'generation': 1,
              'name': 'reclaimspacejob-pvc-test-ea5c79ef14774679b3aeb0bdd405211-ea1dbae3eb514c9181bce043a8c3a719',
              'namespace': 'namespace-test-72d1099c9088432a93dae8768',
              'resourceVersion': '2892378',
              'uid': '3f64b843-5bc9-431e-bf9e-4309e4f057c8'},
 'spec': {'backOffLimit': 10,
          'retryDeadlineSeconds': 900,
          'target': {'persistentVolumeClaim': 'pvc-test-ea5c79ef14774679b3aeb0bdd405211'}},
 'status': {'completionTime': '2023-05-22T05:38:49Z',
            'message': 'Reclaim Space operation successfully completed.',
            'result': 'Succeeded',
            'startTime': '2023-05-22T05:38:21Z'}}

must gather - http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-2155507/

Comment 25 kmanohar 2023-05-22 11:04:52 UTC

OCS-CI Logs of verification - http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-2155507/test3

Comment 28 errata-xmlrpc 2023-06-21 15:22:55 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Data Foundation 4.13.0 enhancement and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:3742