Description of problem (please be detailed as possible and provide log snippests): ReclaimSpaceJob failed due to the error given in the yaml below. $ oc get ReclaimSpaceJob pvc-test-171af96ad0864bc7affdbf311a2b528-reclaim-space-job-8d8a554193d944609dee74504ba70b68 -o yaml apiVersion: csiaddons.openshift.io/v1alpha1 kind: ReclaimSpaceJob metadata: creationTimestamp: "2022-01-27T10:08:09Z" generation: 1 name: pvc-test-171af96ad0864bc7affdbf311a2b528-reclaim-space-job-8d8a554193d944609dee74504ba70b68 namespace: namespace-test-2e14c579516441b8a2898a5b9 resourceVersion: "205710" uid: f486830e-128f-4896-a9cf-688a1353bc83 spec: backOffLimit: 10 retryDeadlineSeconds: 900 target: persistentVolumeClaim: pvc-test-171af96ad0864bc7affdbf311a2b528 status: completionTime: "2022-01-27T10:08:15Z" conditions: - lastTransitionTime: "2022-01-27T10:08:15Z" message: | Failed to make node request: failed to execute "fstrim" on "/var/lib/kubelet/plugins/kubernetes.io/csi/pvc-2e5d0043-740e-439b-886d-fe38abcd9a1d/globalmount/0001-0011-openshift-storage-000000000000000c-0c247823-7f58-11ec-8674-0a580a830011" (an error (exit status 1) occurred while running fstrim args: [/var/lib/kubelet/plugins/kubernetes.io/csi/pvc-2e5d0043-740e-439b-886d-fe38abcd9a1d/globalmount/0001-0011-openshift-storage-000000000000000c-0c247823-7f58-11ec-8674-0a580a830011]): fstrim: cannot open /var/lib/kubelet/plugins/kubernetes.io/csi/pvc-2e5d0043-740e-439b-886d-fe38abcd9a1d/globalmount/0001-0011-openshift-storage-000000000000000c-0c247823-7f58-11ec-8674-0a580a830011: No such file or directory observedGeneration: 1 reason: failed status: "True" type: Failed message: Maximum retry limit reached result: Failed retries: 10 startTime: "2022-01-27T10:08:09Z" PVC is Bound and the app pod status is Running: $ oc get pvc,pod -n namespace-test-2e14c579516441b8a2898a5b9 NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/pvc-test-171af96ad0864bc7affdbf311a2b528 Bound pvc-2e5d0043-740e-439b-886d-fe38abcd9a1d 25Gi RWO storageclass-test-rbd-c7bca453f9084a2d82 28m NAME READY STATUS RESTARTS AGE pod/pod-test-rbd-8ab1a8f9ae2a4135966068c6200 1/1 Running 0 28m must-gather logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-jan27/jijoy-jan27_20220127T045821/logs/deployment_1643279063/ ===================================================== Version of all relevant components (if applicable): ODF 4.10.0-122 4.10.0-0.nightly-2022-01-25-023600 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes, RBD space reclaim process is not working Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? Yes, 2/2 Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: New feature in ODF 4.10 ==================================================== Steps to Reproduce: 1. Create an RBD PVC and attach it to a pod. 2. Create two files with some content and delete one of them. 3. Create ReclaimSpaceJob for the PVC 4. Verify the result of the ReclaimSpaceJob. ReclaimSpaceJob example yaml: apiVersion: csiaddons.openshift.io/v1alpha1 kind: ReclaimSpaceJob metadata: name: pvc-test-171af96ad0864bc7affdbf311a2b528-reclaim-space-job-8d8a554193d944609dee74504ba70b68 spec: backOffLimit: 10 retryDeadlineSeconds: 900 target: persistentVolumeClaim: pvc-test-171af96ad0864bc7affdbf311a2b528 Actual results: The result of ReclaimSpaceJob should be "Succeeded". Expected results: The result of ReclaimSpaceJob is "Failed" Additional info:
Rakshith, comment #0 contains: PVC is Bound and the app pod status is Running: $ oc get pvc,pod -n namespace-test-2e14c579516441b8a2898a5b9 NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/pvc-test-171af96ad0864bc7affdbf311a2b528 Bound pvc-2e5d0043-740e-439b-886d-fe38abcd9a1d 25Gi RWO storageclass-test-rbd-c7bca453f9084a2d82 28m NAME READY STATUS RESTARTS AGE pod/pod-test-rbd-8ab1a8f9ae2a4135966068c6200 1/1 Running 0 28m That suggests there should be a node that has the PV attached and mounted. Can you explain how https://github.com/csi-addons/kubernetes-csi-addons/pull/104 addresses the issue?
Verified using the ocs-ci test case tests/manage/pv_services/space_reclaim/test_rbd_space_reclaim.py::TestRbdSpaceReclaim::test_rbd_space_reclaim added in the PR https://github.com/red-hat-storage/ocs-ci/pull/5327. Test run: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/9559/ Test case logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-feb1/jijoy-feb1_20220201T070110/logs/ocs-ci-logs-1643713953/tests/manage/pv_services/space_reclaim/test_rbd_space_reclaim.py/TestRbdSpaceReclaim/test_rbd_space_reclaim/logs This is also verified manually by following the steps given in comment #0 Verified in version: ODF 4.10.0-132 OCP 4.10.0-0.nightly-2022-01-31-012936
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.10.0 enhancement, security & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1372