Description of problem (please be detailed as possible and provide log snippets): RBD reclaimspace job fails, when the PVC is not mounted, with the following error: Failed to make node request: failed to execute "fstrim" on "/var/lib/kubelet/plugins/kubernetes.io/csi/openshift-storage.rbd.csi.ceph.com/8fa61375d1eb83be19dd45b7e9dde5b67e637171effe8d8defbaa0796d04c350/globalmount/0001-0011-openshift-storage-0000000000000017-7db30de2-71c5-46cc-ab2e-b5a0c8c3d4e7" (an error (exit status 1) occurred while running fstrim args: [/var/lib/kubelet/plugins/kubernetes.io/csi/openshift-storage.rbd.csi.ceph.com/8fa61375d1eb83be19dd45b7e9dde5b67e637171effe8d8defbaa0796d04c350/globalmount/0001-0011-openshift-storage-0000000000000017-7db30de2-71c5-46cc-ab2e-b5a0c8c3d4e7]): fstrim: cannot open /var/lib/kubelet/plugins/kubernetes.io/csi/openshift-storage.rbd.csi.ceph.com/8fa61375d1eb83be19dd45b7e9dde5b67e637171effe8d8defbaa0796d04c350/globalmount/0001-0011-openshift-storage-0000000000000017-7db30de2-71c5-46cc-ab2e-b5a0c8c3d4e7: No such file or directory Version of all relevant components (if applicable): --------------------------------------------------- OCP: 4.12.0-0.nightly-2022-12-01-184212 ODF: 4.12.0-122 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 3 Can this issue reproducible? Yes Can this issue reproduce from the UI? Yes If this is a regression, please provide more details to justify this: Yes, the test passed in ODF 4.12.0-91 Steps to Reproduce: ------------------- 1. Create and attach RBD PVC of size 25 GiB to an app pod. 2. Get the used size of the RBD pool 3. Create a file of size 10GiB 4. Delete the file 5. Delete the pod 6. Create ReclaimSpaceJob 7. No errors should be seen in reclaim space job OCS-CI test: https://github.com/red-hat-storage/ocs-ci/blob/master/tests/manage/pv_services/space_reclaim/test_rbd_space_reclaim.py#L199 Actual results: --------------- RBD reclaimspace job fails when the PVC is not mounted Expected results: ----------------- RBD reclaimspace job should succeed
Racheal, as an initial analysis, I am wondering, this is really a regression? because while you deleted the pod, the global mount path (Staging) path was supposed to be taken out and a job on that path expected to fail. This should have worked the same way in previous release unless, it was that fast the reclaimspace job hit the staging path ( I mean before the complete unmount) . Are we sure this is a regression?
Rakshith, thinking some more on this, I feel there is a room for small enhancement. That said, if the path does not exist ( "no such file or directory") while we really attempted failure looks to be the correct action , but if we notice the VA object with deletion timestamp/finalizer set, can we log that and move away from triggering the job? is it already handled ?
VERIFICATION COMMENTS: Problem Description RBD reclaimspace job fails, when the PVC is not mounted, with the following error: Failed to make node request: failed to execute "fstrim" on "/var/lib/kubelet/plugins/kubernetes.io/csi/openshift-storage.rbd.csi.ceph.com/8fa61375d1eb83be19dd45b7e9dde5b67e637171effe8d8defbaa0796d04c350/globalmount/0001-0011-openshift-storage-0000000000000017-7db30de2-71c5-46cc-ab2e-b5a0c8c3d4e7" (an error (exit status 1) occurred while running fstrim args: [/var/lib/kubelet/plugins/kubernetes.io/csi/openshift-storage.rbd.csi.ceph.com/8fa61375d1eb83be19dd45b7e9dde5b67e637171effe8d8defbaa0796d04c350/globalmount/0001-0011-openshift-storage-0000000000000017-7db30de2-71c5-46cc-ab2e-b5a0c8c3d4e7]): fstrim: cannot open /var/lib/kubelet/plugins/kubernetes.io/csi/openshift-storage.rbd.csi.ceph.com/8fa61375d1eb83be19dd45b7e9dde5b67e637171effe8d8defbaa0796d04c350/globalmount/0001-0011-openshift-storage-0000000000000017-7db30de2-71c5-46cc-ab2e-b5a0c8c3d4e7: No such file or directory Steps to Reproduce: ------------------- 1. Create and attach RBD PVC of size 25 GiB to an app pod. 2. Get the used size of the RBD pool 3. Create a file of size 10GiB 4. Delete the file 5. Delete the pod 6. Create ReclaimSpaceJob 7. No errors should be seen in reclaim space job Actual results: --------------- RBD reclaimspace job fails when the PVC is not mounted Expected results: ----------------- RBD reclaimspace job should succeed _________________________________________________________________ Verified on - 4.13.0-201 ReclaimSpaceJob yaml output:- {'apiVersion': 'csiaddons.openshift.io/v1alpha1', 'kind': 'ReclaimSpaceJob', 'metadata': {'creationTimestamp': '2023-05-22T05:38:21Z', 'generation': 1, 'name': 'reclaimspacejob-pvc-test-ea5c79ef14774679b3aeb0bdd405211-ea1dbae3eb514c9181bce043a8c3a719', 'namespace': 'namespace-test-72d1099c9088432a93dae8768', 'resourceVersion': '2892378', 'uid': '3f64b843-5bc9-431e-bf9e-4309e4f057c8'}, 'spec': {'backOffLimit': 10, 'retryDeadlineSeconds': 900, 'target': {'persistentVolumeClaim': 'pvc-test-ea5c79ef14774679b3aeb0bdd405211'}}, 'status': {'completionTime': '2023-05-22T05:38:49Z', 'message': 'Reclaim Space operation successfully completed.', 'result': 'Succeeded', 'startTime': '2023-05-22T05:38:21Z'}} must gather - http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-2155507/
OCS-CI Logs of verification - http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-2155507/test3
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Data Foundation 4.13.0 enhancement and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:3742