Description of problem (please be detailed as possible and provide log snippests): ReclaimSpaceJob failed due to the error "Controller and Node Client not found". The PVC is in Bound state and the app-pod where the PVC is attached is in Running state. $ oc get ReclaimSpaceJob reclaim-pvcrbd1 -o yaml apiVersion: csiaddons.openshift.io/v1alpha1 kind: ReclaimSpaceJob metadata: creationTimestamp: "2022-01-20T13:16:12Z" generation: 1 name: reclaim-pvcrbd1 namespace: test-project resourceVersion: "84469" uid: b3507d9e-8042-4222-aad0-8bc81b1bd8ac spec: backOffLimit: 10 retryDeadlineSeconds: 900 target: persistentVolumeClaim: pvcrbd1 status: completionTime: "2022-01-20T13:16:17Z" conditions: - lastTransitionTime: "2022-01-20T13:16:17Z" message: Controller and Node Client not found observedGeneration: 1 reason: failed status: "True" type: Failed message: Maximum retry limit reached reclaimedSpace: "0" result: Failed retries: 10 startTime: "2022-01-20T13:16:12Z" $ oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE pvcrbd Bound pvc-31ae909a-697d-497c-a596-9e1d9906071b 1Mi RWO ocs-storagecluster-ceph-rbd 46m pvcrbd1 Bound pvc-60b66128-9d0c-449a-824f-9b4ed698a23a 5Gi RWO ocs-storagecluster-ceph-rbd 2m48s ============================================================ Version of all relevant components (if applicable): ODF 4.10.0-113 OCP 4.10.0-0.nightly-2022-01-19-150530 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes, RBD Reclaim Space feature is not working. Is there any workaround available to the best of your knowledge? Add this in the configmap rook-ceph-operator-config. data: CSI_ENABLE_CSIADDONS: "true" Rakshith suggested this workaround. ReclaimSpaceJob succeeded after adding this. ============================================ Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? Yes Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Create an RBD PVC and attach it to a pod. 2. Create ReclaimSpaceJob 3. Check the sttaus of ReclaimSpaceJob Sample ReclaimSpaceJob yaml: apiVersion: csiaddons.openshift.io/v1alpha1 kind: ReclaimSpaceJob metadata: name: reclaim-pvcrbd1 spec: target: persistentVolumeClaim: pvcrbd1 backOffLimit: 10 retryDeadlineSeconds: 900 Actual results: ReclaimSpaceJob Failed Expected results: ReclaimSpaceJob should succeed. Additional info:
logs - http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-jan20/jijoy-jan20_20220120T111158/logs/deployment_1642687226/
Rook is not configured to deploy Ceph-CSI with the csi-addons sidecar by default. The ConfigMap rook-ceph-operator-config needs to have the `CSI_ENABLE_CSIADDONS: "true"` parameter set. This is a limitation currently inherited from the (upstream) Rook deployment. It should be possible to have this adjusted by OCS-Operator. Enabling the feature by default has my support :-)
Verified in version: ODF 4.10.0-156 OCP 4.10.0-0.nightly-2022-02-15-041303 Tested in AWS CSI_ENABLE_CSIADDONS parameter in the configmap 'rook-ceph-operator-config' is set to "ture" as it's default value. $ oc -n openshift-storage get configmap rook-ceph-operator-config -o yaml apiVersion: v1 data: CSI_ENABLE_CSIADDONS: "true" CSI_LOG_LEVEL: "5" CSI_PLUGIN_TOLERATIONS: |2- - key: node.ocs.openshift.io/storage operator: Equal value: "true" effect: NoSchedule CSI_PROVISIONER_TOLERATIONS: |2- - key: node.ocs.openshift.io/storage operator: Equal value: "true" effect: NoSchedule kind: ConfigMap metadata: creationTimestamp: "2022-02-16T06:31:17Z" name: rook-ceph-operator-config namespace: openshift-storage resourceVersion: "33134" uid: 40f327bd-6d35-4136-8593-a343db56e123
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.10.0 enhancement, security & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1372
Hi Jilju, Do we have any automation coverage for this BZ? If no then can we consider this part of Automation Backlogs?
(In reply to Ramakrishnan Periyasamy from comment #12) > Hi Jilju, Do we have any automation coverage for this BZ? If no then can we > consider this part of Automation Backlogs? We have ReclaimSpace tests automated which will pass only if the value of CSI_ENABLE_CSIADDONS is 'true'. In this case, I think this bug can be considered as covered in test.