2047162 – ReclaimSpaceJob failing, fstrim is executed on a non-existing mountpoint/directory

Bug 2047162 - ReclaimSpaceJob failing, fstrim is executed on a non-existing mountpoint/directory

Summary: ReclaimSpaceJob failing, fstrim is executed on a non-existing mountpoint/dire...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	csi-addons
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	ODF 4.10.0
Assignee:	Rakshith
QA Contact:	Jilju Joy
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-01-27 10:35 UTC by Jilju Joy
Modified:	2023-08-09 16:37 UTC (History)
CC List:	7 users (show)
Fixed In Version:	4.10.0-132
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-04-13 18:52:21 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	2046766	1	unspecified	CLOSED	[IBM Z]: csi-rbdplugin pods failed to come up due to ImagePullBackOff from the "csiaddons" registry	2023-08-09 17:00:43 UTC
Red Hat Product Errata	RHSA-2022:1372	0	None	None	None	2022-04-13 18:52:29 UTC

Description Jilju Joy 2022-01-27 10:35:18 UTC

Description of problem (please be detailed as possible and provide log
snippests):
ReclaimSpaceJob failed due to the error given in the yaml below.

$ oc get ReclaimSpaceJob pvc-test-171af96ad0864bc7affdbf311a2b528-reclaim-space-job-8d8a554193d944609dee74504ba70b68 -o yaml
apiVersion: csiaddons.openshift.io/v1alpha1
kind: ReclaimSpaceJob
metadata:
  creationTimestamp: "2022-01-27T10:08:09Z"
  generation: 1
  name: pvc-test-171af96ad0864bc7affdbf311a2b528-reclaim-space-job-8d8a554193d944609dee74504ba70b68
  namespace: namespace-test-2e14c579516441b8a2898a5b9
  resourceVersion: "205710"
  uid: f486830e-128f-4896-a9cf-688a1353bc83
spec:
  backOffLimit: 10
  retryDeadlineSeconds: 900
  target:
    persistentVolumeClaim: pvc-test-171af96ad0864bc7affdbf311a2b528
status:
  completionTime: "2022-01-27T10:08:15Z"
  conditions:
  - lastTransitionTime: "2022-01-27T10:08:15Z"
    message: |
      Failed to make node request: failed to execute "fstrim" on "/var/lib/kubelet/plugins/kubernetes.io/csi/pvc-2e5d0043-740e-439b-886d-fe38abcd9a1d/globalmount/0001-0011-openshift-storage-000000000000000c-0c247823-7f58-11ec-8674-0a580a830011" (an error (exit status 1) occurred while running fstrim args: [/var/lib/kubelet/plugins/kubernetes.io/csi/pvc-2e5d0043-740e-439b-886d-fe38abcd9a1d/globalmount/0001-0011-openshift-storage-000000000000000c-0c247823-7f58-11ec-8674-0a580a830011]): fstrim: cannot open /var/lib/kubelet/plugins/kubernetes.io/csi/pvc-2e5d0043-740e-439b-886d-fe38abcd9a1d/globalmount/0001-0011-openshift-storage-000000000000000c-0c247823-7f58-11ec-8674-0a580a830011: No such file or directory
    observedGeneration: 1
    reason: failed
    status: "True"
    type: Failed
  message: Maximum retry limit reached
  result: Failed
  retries: 10
  startTime: "2022-01-27T10:08:09Z"


PVC is Bound and the app pod status is Running:
$ oc get pvc,pod -n namespace-test-2e14c579516441b8a2898a5b9
NAME                                                             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                               AGE
persistentvolumeclaim/pvc-test-171af96ad0864bc7affdbf311a2b528   Bound    pvc-2e5d0043-740e-439b-886d-fe38abcd9a1d   25Gi       RWO            storageclass-test-rbd-c7bca453f9084a2d82   28m

NAME                                           READY   STATUS    RESTARTS   AGE
pod/pod-test-rbd-8ab1a8f9ae2a4135966068c6200   1/1     Running   0          28m


must-gather logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-jan27/jijoy-jan27_20220127T045821/logs/deployment_1643279063/
=====================================================
Version of all relevant components (if applicable):
ODF 4.10.0-122
4.10.0-0.nightly-2022-01-25-023600

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, RBD space reclaim process is not working

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
Yes, 2/2

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:
New feature in ODF 4.10

====================================================
Steps to Reproduce:
1. Create an RBD PVC and attach it to a pod.
2. Create two files with some content and delete one of them.
3. Create ReclaimSpaceJob for the PVC
4. Verify the result of the ReclaimSpaceJob.

ReclaimSpaceJob example yaml:

apiVersion: csiaddons.openshift.io/v1alpha1
kind: ReclaimSpaceJob
metadata:
  name: pvc-test-171af96ad0864bc7affdbf311a2b528-reclaim-space-job-8d8a554193d944609dee74504ba70b68
spec:
  backOffLimit: 10
  retryDeadlineSeconds: 900
  target:
    persistentVolumeClaim: pvc-test-171af96ad0864bc7affdbf311a2b528


Actual results:
The result of ReclaimSpaceJob should be "Succeeded".

Expected results:
The result of ReclaimSpaceJob is "Failed"

Additional info:

Comment 5 Niels de Vos 2022-01-27 16:17:25 UTC

Rakshith, comment #0 contains:

PVC is Bound and the app pod status is Running:
$ oc get pvc,pod -n namespace-test-2e14c579516441b8a2898a5b9
NAME                                                             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                               AGE
persistentvolumeclaim/pvc-test-171af96ad0864bc7affdbf311a2b528   Bound    pvc-2e5d0043-740e-439b-886d-fe38abcd9a1d   25Gi       RWO            storageclass-test-rbd-c7bca453f9084a2d82   28m

NAME                                           READY   STATUS    RESTARTS   AGE
pod/pod-test-rbd-8ab1a8f9ae2a4135966068c6200   1/1     Running   0          28m



That suggests there should be a node that has the PV attached and mounted. Can you explain how https://github.com/csi-addons/kubernetes-csi-addons/pull/104 addresses the issue?

Comment 11 Jilju Joy 2022-02-01 14:56:20 UTC

Verified using the ocs-ci test case tests/manage/pv_services/space_reclaim/test_rbd_space_reclaim.py::TestRbdSpaceReclaim::test_rbd_space_reclaim added in the PR
https://github.com/red-hat-storage/ocs-ci/pull/5327.

Test run: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/9559/
Test case logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-feb1/jijoy-feb1_20220201T070110/logs/ocs-ci-logs-1643713953/tests/manage/pv_services/space_reclaim/test_rbd_space_reclaim.py/TestRbdSpaceReclaim/test_rbd_space_reclaim/logs

This is also verified manually by following the steps given in comment #0


Verified in version:
ODF 4.10.0-132
OCP 4.10.0-0.nightly-2022-01-31-012936

Comment 16 errata-xmlrpc 2022-04-13 18:52:21 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.10.0 enhancement, security & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1372

Note You need to log in before you can comment on or make changes to this bug.