2155507 – RBD reclaimspace job fails when the PVC is not mounted

Bug 2155507 - RBD reclaimspace job fails when the PVC is not mounted

Summary: RBD reclaimspace job fails when the PVC is not mounted

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	csi-driver
Sub Component:
Version:	4.12
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	ODF 4.13.0
Assignee:	Rakshith
QA Contact:	kmanohar
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-12-21 11:36 UTC by Rachael
Modified:	2023-08-09 16:37 UTC (History)
CC List:	6 users (show)
Fixed In Version:	4.13.0-93
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-06-21 15:22:55 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	csi-addons kubernetes-csi-addons pull 285	0	None	Merged	reclaimspace: consider VA object only it is attached & not being deleted	2023-01-02 11:12:36 UTC
Red Hat Product Errata	RHBA-2023:3742	0	None	None	None	2023-06-21 15:23:42 UTC

Description Rachael 2022-12-21 11:36:42 UTC

Description of problem (please be detailed as possible and provide log
snippets):

RBD reclaimspace job fails, when the PVC is not mounted, with the following error:

Failed to make node request: failed to execute "fstrim" on "/var/lib/kubelet/plugins/kubernetes.io/csi/openshift-storage.rbd.csi.ceph.com/8fa61375d1eb83be19dd45b7e9dde5b67e637171effe8d8defbaa0796d04c350/globalmount/0001-0011-openshift-storage-0000000000000017-7db30de2-71c5-46cc-ab2e-b5a0c8c3d4e7" (an error (exit status 1) occurred while running fstrim args: [/var/lib/kubelet/plugins/kubernetes.io/csi/openshift-storage.rbd.csi.ceph.com/8fa61375d1eb83be19dd45b7e9dde5b67e637171effe8d8defbaa0796d04c350/globalmount/0001-0011-openshift-storage-0000000000000017-7db30de2-71c5-46cc-ab2e-b5a0c8c3d4e7]): fstrim: cannot open /var/lib/kubelet/plugins/kubernetes.io/csi/openshift-storage.rbd.csi.ceph.com/8fa61375d1eb83be19dd45b7e9dde5b67e637171effe8d8defbaa0796d04c350/globalmount/0001-0011-openshift-storage-0000000000000017-7db30de2-71c5-46cc-ab2e-b5a0c8c3d4e7: No such file or directory


Version of all relevant components (if applicable):
---------------------------------------------------
OCP: 4.12.0-0.nightly-2022-12-01-184212
ODF: 4.12.0-122


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:
Yes, the test passed in ODF 4.12.0-91

Steps to Reproduce:
-------------------
1. Create and attach RBD PVC of size 25 GiB to an app pod.
2. Get the used size of the RBD pool
3. Create a file of size 10GiB
4. Delete the file
5. Delete the pod
6. Create ReclaimSpaceJob
7. No errors should be seen in reclaim space job

OCS-CI test: https://github.com/red-hat-storage/ocs-ci/blob/master/tests/manage/pv_services/space_reclaim/test_rbd_space_reclaim.py#L199


Actual results:
---------------
RBD reclaimspace job fails when the PVC is not mounted


Expected results:
-----------------
RBD reclaimspace job should succeed

Comment 4 Humble Chirammal 2022-12-21 12:01:58 UTC

Racheal, as an initial analysis, I am wondering, this is really a regression? because while you deleted the pod, the global mount path (Staging) path was supposed to be taken out and a job on that path expected to fail. This should have worked the same way in previous release unless, it was that fast the reclaimspace job hit the staging path ( I mean before the complete unmount) . Are we sure this is a regression?

Comment 8 Humble Chirammal 2022-12-22 07:39:56 UTC

Rakshith, thinking some more on this, I feel there is a room for small enhancement. That said, if the path does not exist ( "no such file or directory") while we really attempted failure looks to be the correct action , but if we notice the VA object with deletion timestamp/finalizer set,  can we log that and move away from triggering the job? is it already handled ?

Comment 24 kmanohar 2023-05-22 11:01:44 UTC

VERIFICATION COMMENTS:

Problem Description


RBD reclaimspace job fails, when the PVC is not mounted, with the following error:

Failed to make node request: failed to execute "fstrim" on "/var/lib/kubelet/plugins/kubernetes.io/csi/openshift-storage.rbd.csi.ceph.com/8fa61375d1eb83be19dd45b7e9dde5b67e637171effe8d8defbaa0796d04c350/globalmount/0001-0011-openshift-storage-0000000000000017-7db30de2-71c5-46cc-ab2e-b5a0c8c3d4e7" (an error (exit status 1) occurred while running fstrim args: [/var/lib/kubelet/plugins/kubernetes.io/csi/openshift-storage.rbd.csi.ceph.com/8fa61375d1eb83be19dd45b7e9dde5b67e637171effe8d8defbaa0796d04c350/globalmount/0001-0011-openshift-storage-0000000000000017-7db30de2-71c5-46cc-ab2e-b5a0c8c3d4e7]): fstrim: cannot open /var/lib/kubelet/plugins/kubernetes.io/csi/openshift-storage.rbd.csi.ceph.com/8fa61375d1eb83be19dd45b7e9dde5b67e637171effe8d8defbaa0796d04c350/globalmount/0001-0011-openshift-storage-0000000000000017-7db30de2-71c5-46cc-ab2e-b5a0c8c3d4e7: No such file or directory

Steps to Reproduce:
-------------------
1. Create and attach RBD PVC of size 25 GiB to an app pod.
2. Get the used size of the RBD pool
3. Create a file of size 10GiB
4. Delete the file
5. Delete the pod
6. Create ReclaimSpaceJob
7. No errors should be seen in reclaim space job

Actual results:
---------------
RBD reclaimspace job fails when the PVC is not mounted


Expected results:
-----------------
RBD reclaimspace job should succeed

_________________________________________________________________

Verified on - 4.13.0-201

ReclaimSpaceJob yaml output:-

{'apiVersion': 'csiaddons.openshift.io/v1alpha1',
 'kind': 'ReclaimSpaceJob',
 'metadata': {'creationTimestamp': '2023-05-22T05:38:21Z',
              'generation': 1,
              'name': 'reclaimspacejob-pvc-test-ea5c79ef14774679b3aeb0bdd405211-ea1dbae3eb514c9181bce043a8c3a719',
              'namespace': 'namespace-test-72d1099c9088432a93dae8768',
              'resourceVersion': '2892378',
              'uid': '3f64b843-5bc9-431e-bf9e-4309e4f057c8'},
 'spec': {'backOffLimit': 10,
          'retryDeadlineSeconds': 900,
          'target': {'persistentVolumeClaim': 'pvc-test-ea5c79ef14774679b3aeb0bdd405211'}},
 'status': {'completionTime': '2023-05-22T05:38:49Z',
            'message': 'Reclaim Space operation successfully completed.',
            'result': 'Succeeded',
            'startTime': '2023-05-22T05:38:21Z'}}

must gather - http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-2155507/

Comment 25 kmanohar 2023-05-22 11:04:52 UTC

OCS-CI Logs of verification - http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-2155507/test3

Comment 28 errata-xmlrpc 2023-06-21 15:22:55 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Data Foundation 4.13.0 enhancement and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:3742

Note You need to log in before you can comment on or make changes to this bug.