Bug 2314454

Summary: [Provider Mode]Error "node Client not found for "hcp417-bm3-aaa-vvgph-sb6fz" nodeID" in ReclaimSpace Job
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Jilju Joy <jijoy>
Component: csi-addonsAssignee: Rewant <resoni>
Status: CLOSED ERRATA QA Contact: Jilju Joy <jijoy>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.17CC: odf-bz-bot, resoni
Target Milestone: ---Keywords: Automation, Regression
Target Release: ODF 4.17.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: isf-provider
Fixed In Version: 4.17.0-117 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 2314586 (view as bug list) Environment:
Last Closed: 2024-10-30 14:35:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2314586    

Description Jilju Joy 2024-09-24 15:56:43 UTC
Description of problem (please be detailed as possible and provide log
snippests):

ReclaimSpaceJob on client cluster in a provider mode setup failed with error

Failed to make node request: node Client not found for "hcp417-bm3-aaa-vvgph-sb6fz" nodeID


Yaml of the ReclaimSpaceJob CR.

apiVersion: csiaddons.openshift.io/v1alpha1
kind: ReclaimSpaceJob
metadata:
  creationTimestamp: '2024-09-24T15:36:33Z'
  generation: 1
  managedFields:
    - apiVersion: csiaddons.openshift.io/v1alpha1
      fieldsType: FieldsV1
      fieldsV1:
        'f:spec':
          .: {}
          'f:backOffLimit': {}
          'f:retryDeadlineSeconds': {}
          'f:target':
            .: {}
            'f:persistentVolumeClaim': {}
      manager: kubectl-create
      operation: Update
      time: '2024-09-24T15:36:33Z'
    - apiVersion: csiaddons.openshift.io/v1alpha1
      fieldsType: FieldsV1
      fieldsV1:
        'f:status':
          .: {}
          'f:completionTime': {}
          'f:conditions': {}
          'f:message': {}
          'f:result': {}
          'f:retries': {}
          'f:startTime': {}
      manager: csi-addons-manager
      operation: Update
      subresource: status
      time: '2024-09-24T15:36:38Z'
  name: reclaimspacejob-pvc-test-c537a2c95fc342daa1001d111af49ed-49754664013948bda430e1e7fef81ddd
  namespace: namespace-test-225b210758e4476e9160c5de6
  resourceVersion: '2520301'
  uid: 424cb497-db16-4eae-adad-1b42a900dee4
spec:
  backOffLimit: 10
  retryDeadlineSeconds: 900
  target:
    persistentVolumeClaim: pvc-test-c537a2c95fc342daa1001d111af49ed
status:
  completionTime: '2024-09-24T15:36:38Z'
  conditions:
    - lastTransitionTime: '2024-09-24T15:36:38Z'
      message: 'Failed to make node request: node Client not found for "hcp417-bm3-aaa-vvgph-sb6fz" nodeID'
      observedGeneration: 1
      reason: failed
      status: 'True'
      type: Failed
  message: Maximum retry limit reached
  result: Failed
  retries: 10
  startTime: '2024-09-24T15:36:33Z'



Error logs from the pod csi-addons-controller-manager-7d7b754d7-2jwcs

2024-09-24T15:36:36.020Z ERROR Reconciler error {"controller": "reclaimspacejob", "controllerGroup": "csiaddons.openshift.io", "controllerKind": "ReclaimSpaceJob", "ReclaimSpaceJob": {"name":"reclaimspacejob-pvc-test-c537a2c95fc342daa1001d111af49ed-49754664013948bda430e1e7fef81ddd","namespace":"namespace-test-225b210758e4476e9160c5de6"}, "namespace": "namespace-test-225b210758e4476e9160c5de6", "name": "reclaimspacejob-pvc-test-c537a2c95fc342daa1001d111af49ed-49754664013948bda430e1e7fef81ddd", "reconcileID": "52d84f43-a325-4a96-a596-deb92978d87d", "error": "node Client not found for \"hcp417-bm3-aaa-vvgph-sb6fz\" nodeID"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:324
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:261
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:222
2024-09-24T15:36:38.581Z ERROR Failed to make node request {"controller": "reclaimspacejob", "controllerGroup": "csiaddons.openshift.io", "controllerKind": "ReclaimSpaceJob", "ReclaimSpaceJob": {"name":"reclaimspacejob-pvc-test-c537a2c95fc342daa1001d111af49ed-49754664013948bda430e1e7fef81ddd","namespace":"namespace-test-225b210758e4476e9160c5de6"}, "namespace": "namespace-test-225b210758e4476e9160c5de6", "name": "reclaimspacejob-pvc-test-c537a2c95fc342daa1001d111af49ed-49754664013948bda430e1e7fef81ddd", "reconcileID": "05360b57-0e8b-4c94-9b7f-f5119df85bac", "PVCName": "pvc-test-c537a2c95fc342daa1001d111af49ed", "PVCNamespace": "namespace-test-225b210758e4476e9160c5de6", "PVName": "pvc-038da1c5-767c-4182-b908-845b1a56d86f", "NodeID": "hcp417-bm3-aaa-vvgph-sb6fz", "Timeout": "3m0s", "error": "node Client not found for \"hcp417-bm3-aaa-vvgph-sb6fz\" nodeID"}
github.com/csi-addons/kubernetes-csi-addons/internal/controller/csiaddons.(*ReclaimSpaceJobReconciler).reconcile

-------------------------------

Must-gather from client cluster - http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/ibm-baremetal3/ibm-baremetal3_20240918T135059/logs/failed_testcase_ocs_logs_1727191853/test_rbd_space_reclaim_ocs_logs/hcp417-bm3-aaa/ocs_must_gather/

Must-gather from provider cluster - http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/ibm-baremetal3/ibm-baremetal3_20240918T135059/logs/failed_testcase_ocs_logs_1727191853/test_rbd_space_reclaim_ocs_logs/ibm-baremetal3/ocs_must_gather/

============================================================

Version of all relevant components (if applicable):
Client cluster ODF 4.17.0-103, OCP 4.17.0-rc.0
Provider cluster ODF 4.17.0-103, OCP 4.16.10

============================================================

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, RBD space reclaim is not working

============================================================

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

============================================================

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:
Worked with 4.16.0

==========================================================

Steps to Reproduce:
Run the test case
tests/functional/pv/space_reclaim/test_rbd_space_reclaim.py::TestRbdSpaceReclaim::test_rbd_space_reclaim 

==========================================================
Actual results:
ReclaimSpaceJob failed

Expected results:
ReclaimSpaceJob should work

Additional info:

Comment 11 errata-xmlrpc 2024-10-30 14:35:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.17.0 Security, Enhancement, & Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:8676