Description of problem (please be detailed as possible and provide log snippests): ReclaimSpaceJob is failing for RBD RWX PVC. The volume mode of the PVC is Block. The error message is given in the yaml output below. $ oc -n namespace-test-ed830903a70e449fba4545c97 get ReclaimSpaceJob reclaimspacejob-pvc-test-f40b81c2991d4d2a9ff0a19c2745883-ab490a97e0bd4b51b048921718a2498e -o yaml apiVersion: csiaddons.openshift.io/v1alpha1 kind: ReclaimSpaceJob metadata: creationTimestamp: "2022-02-01T13:46:39Z" generation: 1 name: reclaimspacejob-pvc-test-f40b81c2991d4d2a9ff0a19c2745883-ab490a97e0bd4b51b048921718a2498e namespace: namespace-test-ed830903a70e449fba4545c97 resourceVersion: "267293" uid: 8e813fe3-6cbb-4451-9336-1ed214b71975 spec: backOffLimit: 10 retryDeadlineSeconds: 900 target: persistentVolumeClaim: pvc-test-f40b81c2991d4d2a9ff0a19c2745883 status: completionTime: "2022-02-01T13:46:45Z" conditions: - lastTransitionTime: "2022-02-01T13:46:45Z" message: 'Failed to make node request: multi-node space reclaim is not supported' observedGeneration: 1 reason: failed status: "True" type: Failed message: Maximum retry limit reached result: Failed retries: 10 startTime: "2022-02-01T13:46:39Z" PVC: $ oc -n namespace-test-ed830903a70e449fba4545c97 get pvc -o yaml apiVersion: v1 items: - apiVersion: v1 kind: PersistentVolumeClaim metadata: annotations: pv.kubernetes.io/bind-completed: "yes" pv.kubernetes.io/bound-by-controller: "yes" volume.beta.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com volume.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com creationTimestamp: "2022-02-01T13:46:16Z" finalizers: - kubernetes.io/pvc-protection name: pvc-test-f40b81c2991d4d2a9ff0a19c2745883 namespace: namespace-test-ed830903a70e449fba4545c97 resourceVersion: "266936" uid: 68ba9e48-f504-4ad2-8128-116a2d3f6697 spec: accessModes: - ReadWriteMany resources: requests: storage: 25Gi storageClassName: storageclass-test-rbd-fd9b16f26d3c4a4386 volumeMode: Block volumeName: pvc-68ba9e48-f504-4ad2-8128-116a2d3f6697 status: accessModes: - ReadWriteMany capacity: storage: 25Gi phase: Bound kind: List metadata: resourceVersion: "" selfLink: "" Pod: $ oc -n namespace-test-ed830903a70e449fba4545c97 get pod -o yaml apiVersion: v1 items: - apiVersion: v1 kind: Pod metadata: annotations: k8s.v1.cni.cncf.io/network-status: |- [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.131.0.87" ], "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: |- [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.131.0.87" ], "default": true, "dns": {} }] openshift.io/scc: privileged creationTimestamp: "2022-02-01T13:46:26Z" name: pod-test-rbd-090a972f5785461c9f5db57a848 namespace: namespace-test-ed830903a70e449fba4545c97 resourceVersion: "267161" uid: 2f503c43-f774-4e92-ae01-f5e114c7bea9 spec: containers: - image: quay.io/ocsci/nginx:latest imagePullPolicy: IfNotPresent name: my-container resources: {} securityContext: capabilities: add: - SYS_ADMIN terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeDevices: - devicePath: /dev/rbdblock name: my-volume volumeMounts: - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: kube-api-access-vsc95 readOnly: true dnsPolicy: ClusterFirst enableServiceLinks: true imagePullSecrets: - name: default-dockercfg-gxwm2 nodeName: ip-10-0-140-53.us-east-2.compute.internal preemptionPolicy: PreemptLowerPriority priority: 0 restartPolicy: Always schedulerName: default-scheduler securityContext: {} serviceAccount: default serviceAccountName: default terminationGracePeriodSeconds: 30 tolerations: - effect: NoExecute key: node.kubernetes.io/not-ready operator: Exists tolerationSeconds: 300 - effect: NoExecute key: node.kubernetes.io/unreachable operator: Exists tolerationSeconds: 300 volumes: - name: my-volume persistentVolumeClaim: claimName: pvc-test-f40b81c2991d4d2a9ff0a19c2745883 - name: kube-api-access-vsc95 projected: defaultMode: 420 sources: - serviceAccountToken: expirationSeconds: 3607 path: token - configMap: items: - key: ca.crt path: ca.crt name: kube-root-ca.crt - downwardAPI: items: - fieldRef: apiVersion: v1 fieldPath: metadata.namespace path: namespace - configMap: items: - key: service-ca.crt path: service-ca.crt name: openshift-service-ca.crt status: conditions: - lastProbeTime: null lastTransitionTime: "2022-02-01T13:46:26Z" status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: "2022-02-01T13:46:34Z" status: "True" type: Ready - lastProbeTime: null lastTransitionTime: "2022-02-01T13:46:34Z" status: "True" type: ContainersReady - lastProbeTime: null lastTransitionTime: "2022-02-01T13:46:26Z" status: "True" type: PodScheduled containerStatuses: - containerID: cri-o://e982e4dd7c6ebb98e4648b7f0a1d8554fab73ab2ebafff68b7f5115b663abca9 image: quay.io/ocsci/nginx:latest imageID: quay.io/ocsci/nginx@sha256:34f3f875e745861ff8a37552ed7eb4b673544d2c56c7cc58f9a9bec5b4b3530e lastState: {} name: my-container ready: true restartCount: 0 started: true state: running: startedAt: "2022-02-01T13:46:33Z" hostIP: 10.0.140.53 phase: Running podIP: 10.131.0.87 podIPs: - ip: 10.131.0.87 qosClass: BestEffort startTime: "2022-02-01T13:46:26Z" kind: List metadata: resourceVersion: "" selfLink: "" must-gather logs - http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-feb1/jijoy-feb1_20220201T070110/logs/deployment_1643724310/ =============================================================== Version of all relevant components (if applicable): ODF 4.10.0-132 OCP 4.10.0-0.nightly-2022-01-31-012936 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? RBD space reclaim is not working for RBD RWX PVC Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? Yes Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: RBD pace reclaim is a new feature Steps to Reproduce: 1. Create RBD RWX PVC (volume mode Block) and attach it to a pod. 2. Create ReclaimSpaceJob Example : apiVersion: csiaddons.openshift.io/v1alpha1 kind: ReclaimSpaceJob metadata: name: reclaimspacejob-pvc-test-f40b81c2991d4d2a9ff0a19c2745883-ab490a97e0bd4b51b048921718a2498e spec: backOffLimit: 10 retryDeadlineSeconds: 900 target: persistentVolumeClaim: pvc-test-f40b81c2991d4d2a9ff0a19c2745883 3. Check the result of the ReclaimSpaceJob Actual results: ReclaimSpaceJob result is "Failed" Expected results: ReclaimSpaceJob result should be "Succeeded" Additional info:
For RXW block-PVCs there is no safe way to run `blkdiscard` on the device. Applications are in charge of maintaining the data consistency between multiple readers/writers. It is not suitable for a CSI-driver to write to the device behind the back of an application. The CSI-driver can report `not supported` for the NodeReclaimSpace operation, but the controller should continue with running ControllerReclaimSpace.
Upstream change is at https://github.com/csi-addons/kubernetes-csi-addons/pull/113 This will need to get backported to https://github.com/red-hat-storage/kubernetes-csi-addons (main branch) before it can be included in a build for odf-4.10 (release-4.10 branch).
VERIFICATION COMMENTS :- Steps followed: 1. Create RBD RWX PVC (volume mode Block) and attach it to a pod. 2. Create ReclaimSpaceJob RECLAIM SPACE YAML oc -n test-reclaimspace get Reclaimspacejob test-reclaimspace -o yaml apiVersion: csiaddons.openshift.io/v1alpha1 kind: ReclaimSpaceJob metadata: creationTimestamp: "2022-03-03T05:50:49Z" generation: 1 name: test-reclaimspace namespace: test-reclaimspace resourceVersion: "1937589" uid: 14f02e70-4ed6-450f-98da-13ec63910e2f spec: backOffLimit: 10 retryDeadlineSeconds: 900 target: persistentVolumeClaim: reclaim-pvc status: completionTime: "2022-03-03T05:50:52Z" message: Reclaim Space operation successfully completed. result: Succeeded startTime: "2022-03-03T05:50:49Z" _________ Pod yaml oc -n test-reclaimspace get pod test-reclaimspace-pod -o yaml apiVersion: v1 kind: Pod metadata: annotations: k8s.v1.cni.cncf.io/network-status: |- [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.129.3.148" ], "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: |- [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.129.3.148" ], "default": true, "dns": {} }] openshift.io/scc: privileged creationTimestamp: "2022-03-03T05:01:10Z" name: test-reclaimspace-pod namespace: test-reclaimspace resourceVersion: "1904962" uid: 51f7d405-40d8-4bf8-bbd1-03aeae10138c spec: containers: - image: quay.io/ocsci/nginx:latest imagePullPolicy: IfNotPresent name: my-container resources: {} securityContext: capabilities: add: - SYS_ADMIN terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeDevices: - devicePath: /dev/rbdblock name: my-volume volumeMounts: - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: kube-api-access-kstj4 readOnly: true dnsPolicy: ClusterFirst enableServiceLinks: true imagePullSecrets: - name: default-dockercfg-cklls nodeName: ip-10-0-181-49.us-east-2.compute.internal preemptionPolicy: PreemptLowerPriority priority: 0 restartPolicy: Always schedulerName: default-scheduler securityContext: {} serviceAccount: default serviceAccountName: default terminationGracePeriodSeconds: 30 tolerations: - effect: NoExecute key: node.kubernetes.io/not-ready operator: Exists tolerationSeconds: 300 - effect: NoExecute key: node.kubernetes.io/unreachable operator: Exists tolerationSeconds: 300 volumes: - name: my-volume persistentVolumeClaim: claimName: reclaim-pvc - name: kube-api-access-kstj4 projected: defaultMode: 420 sources: - serviceAccountToken: expirationSeconds: 3607 path: token - configMap: items: - key: ca.crt path: ca.crt name: kube-root-ca.crt - downwardAPI: items: - fieldRef: apiVersion: v1 fieldPath: metadata.namespace path: namespace - configMap: items: - key: service-ca.crt path: service-ca.crt name: openshift-service-ca.crt status: conditions: - lastProbeTime: null lastTransitionTime: "2022-03-03T05:01:11Z" status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: "2022-03-03T05:01:16Z" status: "True" type: Ready - lastProbeTime: null lastTransitionTime: "2022-03-03T05:01:16Z" status: "True" type: ContainersReady - lastProbeTime: null lastTransitionTime: "2022-03-03T05:01:10Z" status: "True" type: PodScheduled containerStatuses: - containerID: cri-o://8fe83a327264bc77f50437f4bd46fd070bcf77b1b00083531dad545006592ee7 image: quay.io/ocsci/nginx:latest imageID: quay.io/ocsci/nginx@sha256:34f3f875e745861ff8a37552ed7eb4b673544d2c56c7cc58f9a9bec5b4b3530e lastState: {} name: my-container ready: true restartCount: 0 started: true state: running: startedAt: "2022-03-03T05:01:15Z" hostIP: 10.0.181.49 phase: Running podIP: 10.129.3.148 podIPs: - ip: 10.129.3.148 qosClass: BestEffort startTime: "2022-03-03T05:01:11Z" ___________ PVC yaml oc -n test-reclaimspace get pvc reclaim-pvc -o yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: annotations: pv.kubernetes.io/bind-completed: "yes" pv.kubernetes.io/bound-by-controller: "yes" volume.beta.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com volume.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com creationTimestamp: "2022-03-03T04:56:13Z" finalizers: - kubernetes.io/pvc-protection name: reclaim-pvc namespace: test-reclaimspace resourceVersion: "1901611" uid: 3ee60387-8a99-40a1-85a0-4252f0cbe5bc spec: accessModes: - ReadWriteMany resources: requests: storage: 25Gi storageClassName: ocs-storagecluster-ceph-rbd volumeMode: Block volumeName: pvc-3ee60387-8a99-40a1-85a0-4252f0cbe5bc status: accessModes: - ReadWriteMany capacity: storage: 25Gi phase: Bound
(In reply to kmanohar from comment #9) > VERIFICATION COMMENTS :- > > Steps followed: > 1. Create RBD RWX PVC (volume mode Block) and attach it to a pod. > 2. Create ReclaimSpaceJob > > RECLAIM SPACE YAML > > oc -n test-reclaimspace get Reclaimspacejob test-reclaimspace -o yaml > > apiVersion: csiaddons.openshift.io/v1alpha1 > kind: ReclaimSpaceJob > metadata: > creationTimestamp: "2022-03-03T05:50:49Z" > generation: 1 > name: test-reclaimspace > namespace: test-reclaimspace > resourceVersion: "1937589" > uid: 14f02e70-4ed6-450f-98da-13ec63910e2f > spec: > backOffLimit: 10 > retryDeadlineSeconds: 900 > target: > persistentVolumeClaim: reclaim-pvc > status: > completionTime: "2022-03-03T05:50:52Z" > message: Reclaim Space operation successfully completed. > result: Succeeded > startTime: "2022-03-03T05:50:49Z" > > _________ > > > Pod yaml > > oc -n test-reclaimspace get pod test-reclaimspace-pod -o yaml > apiVersion: v1 > kind: Pod > metadata: > annotations: > k8s.v1.cni.cncf.io/network-status: |- > [{ > "name": "openshift-sdn", > "interface": "eth0", > "ips": [ > "10.129.3.148" > ], > "default": true, > "dns": {} > }] > k8s.v1.cni.cncf.io/networks-status: |- > [{ > "name": "openshift-sdn", > "interface": "eth0", > "ips": [ > "10.129.3.148" > ], > "default": true, > "dns": {} > }] > openshift.io/scc: privileged > creationTimestamp: "2022-03-03T05:01:10Z" > name: test-reclaimspace-pod > namespace: test-reclaimspace > resourceVersion: "1904962" > uid: 51f7d405-40d8-4bf8-bbd1-03aeae10138c > spec: > containers: > - image: quay.io/ocsci/nginx:latest > imagePullPolicy: IfNotPresent > name: my-container > resources: {} > securityContext: > capabilities: > add: > - SYS_ADMIN > terminationMessagePath: /dev/termination-log > terminationMessagePolicy: File > volumeDevices: > - devicePath: /dev/rbdblock > name: my-volume > volumeMounts: > - mountPath: /var/run/secrets/kubernetes.io/serviceaccount > name: kube-api-access-kstj4 > readOnly: true > dnsPolicy: ClusterFirst > enableServiceLinks: true > imagePullSecrets: > - name: default-dockercfg-cklls > nodeName: ip-10-0-181-49.us-east-2.compute.internal > preemptionPolicy: PreemptLowerPriority > priority: 0 > restartPolicy: Always > schedulerName: default-scheduler > securityContext: {} > serviceAccount: default > serviceAccountName: default > terminationGracePeriodSeconds: 30 > tolerations: > - effect: NoExecute > key: node.kubernetes.io/not-ready > operator: Exists > tolerationSeconds: 300 > - effect: NoExecute > key: node.kubernetes.io/unreachable > operator: Exists > tolerationSeconds: 300 > volumes: > - name: my-volume > persistentVolumeClaim: > claimName: reclaim-pvc > - name: kube-api-access-kstj4 > projected: > defaultMode: 420 > sources: > - serviceAccountToken: > expirationSeconds: 3607 > path: token > - configMap: > items: > - key: ca.crt > path: ca.crt > name: kube-root-ca.crt > - downwardAPI: > items: > - fieldRef: > apiVersion: v1 > fieldPath: metadata.namespace > path: namespace > - configMap: > items: > - key: service-ca.crt > path: service-ca.crt > name: openshift-service-ca.crt > status: > conditions: > - lastProbeTime: null > lastTransitionTime: "2022-03-03T05:01:11Z" > status: "True" > type: Initialized > - lastProbeTime: null > lastTransitionTime: "2022-03-03T05:01:16Z" > status: "True" > type: Ready > - lastProbeTime: null > lastTransitionTime: "2022-03-03T05:01:16Z" > status: "True" > type: ContainersReady > - lastProbeTime: null > lastTransitionTime: "2022-03-03T05:01:10Z" > status: "True" > type: PodScheduled > containerStatuses: > - containerID: > cri-o://8fe83a327264bc77f50437f4bd46fd070bcf77b1b00083531dad545006592ee7 > image: quay.io/ocsci/nginx:latest > imageID: > quay.io/ocsci/nginx@sha256: > 34f3f875e745861ff8a37552ed7eb4b673544d2c56c7cc58f9a9bec5b4b3530e > lastState: {} > name: my-container > ready: true > restartCount: 0 > started: true > state: > running: > startedAt: "2022-03-03T05:01:15Z" > hostIP: 10.0.181.49 > phase: Running > podIP: 10.129.3.148 > podIPs: > - ip: 10.129.3.148 > qosClass: BestEffort > startTime: "2022-03-03T05:01:11Z" > > ___________ > > PVC yaml > > oc -n test-reclaimspace get pvc reclaim-pvc -o yaml > apiVersion: v1 > kind: PersistentVolumeClaim > metadata: > annotations: > pv.kubernetes.io/bind-completed: "yes" > pv.kubernetes.io/bound-by-controller: "yes" > volume.beta.kubernetes.io/storage-provisioner: > openshift-storage.rbd.csi.ceph.com > volume.kubernetes.io/storage-provisioner: > openshift-storage.rbd.csi.ceph.com > creationTimestamp: "2022-03-03T04:56:13Z" > finalizers: > - kubernetes.io/pvc-protection > name: reclaim-pvc > namespace: test-reclaimspace > resourceVersion: "1901611" > uid: 3ee60387-8a99-40a1-85a0-4252f0cbe5bc > spec: > accessModes: > - ReadWriteMany > resources: > requests: > storage: 25Gi > storageClassName: ocs-storagecluster-ceph-rbd > volumeMode: Block > volumeName: pvc-3ee60387-8a99-40a1-85a0-4252f0cbe5bc > status: > accessModes: > - ReadWriteMany > capacity: > storage: 25Gi > phase: Bound Build - 4.10.0-0.nightly-2022-02-26-230022
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.10.0 enhancement, security & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1372