Bug 2314454 - [Provider Mode]Error "node Client not found for "hcp417-bm3-aaa-vvgph-sb6fz" nodeID" in ReclaimSpace Job
Summary: [Provider Mode]Error "node Client not found for "hcp417-bm3-aaa-vvgph-sb6fz" ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: csi-addons
Version: 4.17
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ODF 4.17.0
Assignee: Rewant
QA Contact: Jilju Joy
URL:
Whiteboard: isf-provider
Depends On:
Blocks: 2314586
TreeView+ depends on / blocked
 
Reported: 2024-09-24 15:56 UTC by Jilju Joy
Modified: 2024-10-30 14:36 UTC (History)
2 users (show)

Fixed In Version: 4.17.0-117
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 2314586 (view as bug list)
Environment:
Last Closed: 2024-10-30 14:35:59 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage kubernetes-csi-addons pull 209 0 None open Bug 2314454: refactor parseEndpoint to accept pod names with '.' in it 2024-09-25 06:21:48 UTC
Github red-hat-storage kubernetes-csi-addons pull 215 0 None open Bug 2314454: sanitize connection pool key pod name 2024-10-07 11:53:50 UTC
Red Hat Issue Tracker OCSBZM-9318 0 None None None 2024-10-07 05:49:03 UTC
Red Hat Product Errata RHSA-2024:8676 0 None None None 2024-10-30 14:36:07 UTC

Description Jilju Joy 2024-09-24 15:56:43 UTC
Description of problem (please be detailed as possible and provide log
snippests):

ReclaimSpaceJob on client cluster in a provider mode setup failed with error

Failed to make node request: node Client not found for "hcp417-bm3-aaa-vvgph-sb6fz" nodeID


Yaml of the ReclaimSpaceJob CR.

apiVersion: csiaddons.openshift.io/v1alpha1
kind: ReclaimSpaceJob
metadata:
  creationTimestamp: '2024-09-24T15:36:33Z'
  generation: 1
  managedFields:
    - apiVersion: csiaddons.openshift.io/v1alpha1
      fieldsType: FieldsV1
      fieldsV1:
        'f:spec':
          .: {}
          'f:backOffLimit': {}
          'f:retryDeadlineSeconds': {}
          'f:target':
            .: {}
            'f:persistentVolumeClaim': {}
      manager: kubectl-create
      operation: Update
      time: '2024-09-24T15:36:33Z'
    - apiVersion: csiaddons.openshift.io/v1alpha1
      fieldsType: FieldsV1
      fieldsV1:
        'f:status':
          .: {}
          'f:completionTime': {}
          'f:conditions': {}
          'f:message': {}
          'f:result': {}
          'f:retries': {}
          'f:startTime': {}
      manager: csi-addons-manager
      operation: Update
      subresource: status
      time: '2024-09-24T15:36:38Z'
  name: reclaimspacejob-pvc-test-c537a2c95fc342daa1001d111af49ed-49754664013948bda430e1e7fef81ddd
  namespace: namespace-test-225b210758e4476e9160c5de6
  resourceVersion: '2520301'
  uid: 424cb497-db16-4eae-adad-1b42a900dee4
spec:
  backOffLimit: 10
  retryDeadlineSeconds: 900
  target:
    persistentVolumeClaim: pvc-test-c537a2c95fc342daa1001d111af49ed
status:
  completionTime: '2024-09-24T15:36:38Z'
  conditions:
    - lastTransitionTime: '2024-09-24T15:36:38Z'
      message: 'Failed to make node request: node Client not found for "hcp417-bm3-aaa-vvgph-sb6fz" nodeID'
      observedGeneration: 1
      reason: failed
      status: 'True'
      type: Failed
  message: Maximum retry limit reached
  result: Failed
  retries: 10
  startTime: '2024-09-24T15:36:33Z'



Error logs from the pod csi-addons-controller-manager-7d7b754d7-2jwcs

2024-09-24T15:36:36.020Z ERROR Reconciler error {"controller": "reclaimspacejob", "controllerGroup": "csiaddons.openshift.io", "controllerKind": "ReclaimSpaceJob", "ReclaimSpaceJob": {"name":"reclaimspacejob-pvc-test-c537a2c95fc342daa1001d111af49ed-49754664013948bda430e1e7fef81ddd","namespace":"namespace-test-225b210758e4476e9160c5de6"}, "namespace": "namespace-test-225b210758e4476e9160c5de6", "name": "reclaimspacejob-pvc-test-c537a2c95fc342daa1001d111af49ed-49754664013948bda430e1e7fef81ddd", "reconcileID": "52d84f43-a325-4a96-a596-deb92978d87d", "error": "node Client not found for \"hcp417-bm3-aaa-vvgph-sb6fz\" nodeID"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:324
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:261
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:222
2024-09-24T15:36:38.581Z ERROR Failed to make node request {"controller": "reclaimspacejob", "controllerGroup": "csiaddons.openshift.io", "controllerKind": "ReclaimSpaceJob", "ReclaimSpaceJob": {"name":"reclaimspacejob-pvc-test-c537a2c95fc342daa1001d111af49ed-49754664013948bda430e1e7fef81ddd","namespace":"namespace-test-225b210758e4476e9160c5de6"}, "namespace": "namespace-test-225b210758e4476e9160c5de6", "name": "reclaimspacejob-pvc-test-c537a2c95fc342daa1001d111af49ed-49754664013948bda430e1e7fef81ddd", "reconcileID": "05360b57-0e8b-4c94-9b7f-f5119df85bac", "PVCName": "pvc-test-c537a2c95fc342daa1001d111af49ed", "PVCNamespace": "namespace-test-225b210758e4476e9160c5de6", "PVName": "pvc-038da1c5-767c-4182-b908-845b1a56d86f", "NodeID": "hcp417-bm3-aaa-vvgph-sb6fz", "Timeout": "3m0s", "error": "node Client not found for \"hcp417-bm3-aaa-vvgph-sb6fz\" nodeID"}
github.com/csi-addons/kubernetes-csi-addons/internal/controller/csiaddons.(*ReclaimSpaceJobReconciler).reconcile

-------------------------------

Must-gather from client cluster - http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/ibm-baremetal3/ibm-baremetal3_20240918T135059/logs/failed_testcase_ocs_logs_1727191853/test_rbd_space_reclaim_ocs_logs/hcp417-bm3-aaa/ocs_must_gather/

Must-gather from provider cluster - http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/ibm-baremetal3/ibm-baremetal3_20240918T135059/logs/failed_testcase_ocs_logs_1727191853/test_rbd_space_reclaim_ocs_logs/ibm-baremetal3/ocs_must_gather/

============================================================

Version of all relevant components (if applicable):
Client cluster ODF 4.17.0-103, OCP 4.17.0-rc.0
Provider cluster ODF 4.17.0-103, OCP 4.16.10

============================================================

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, RBD space reclaim is not working

============================================================

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

============================================================

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:
Worked with 4.16.0

==========================================================

Steps to Reproduce:
Run the test case
tests/functional/pv/space_reclaim/test_rbd_space_reclaim.py::TestRbdSpaceReclaim::test_rbd_space_reclaim 

==========================================================
Actual results:
ReclaimSpaceJob failed

Expected results:
ReclaimSpaceJob should work

Additional info:

Comment 11 errata-xmlrpc 2024-10-30 14:35:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.17.0 Security, Enhancement, & Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:8676


Note You need to log in before you can comment on or make changes to this bug.