Description of problem: ============================== On one of our long running BM clusters, where we have been performing multiple tasks, like failures, Key Rotation etc, it is observed that the Status reporter pod is in CLBO since multiple days, OCP clients, HCP clients etc >> oc get pods -o wide|grep status storageclient-737342087af10580-status-reporter-28624719-ftc49 0/1 ContainerStatusUnknown 4 (3m21s ago) 4m49s 10.130.1.83 baremetal1-03 <none> <none> storageclient-737342087af10580-status-reporter-28624722-8d74p 0/1 ContainerStatusUnknown 4 (44s ago) 2m14s 10.130.1.86 baremetal1-03 <none> <none> storageclient-737342087af10580-status-reporter-28624722-9qzfg 0/1 CrashLoopBackOff 1 (9s ago) 11s 10.130.1.87 baremetal1-03 <none> <none> >> date ; oc logs storageclient-737342087af10580-status-reporter-28624722-9qzfg Tue Jun 4 12:15:02 IST 2024 W0604 06:44:52.170711 1 main.go:160] Failed to get clusterDNS "cluster": dnses.config.openshift.io "cluster" is forbidden: User "system:serviceaccount:openshift-storage:ocs-client-operator-status-reporter" cannot get resource "dnses" in API group "config.openshift.io" at the cluster scope W0604 06:44:52.170828 1 main.go:164] Cluster Base Domain is empty. F0604 06:44:52.233151 1 main.go:142] Failed to update mon configmap for storageClient 85d4a535-2380-4bda-a42c-727777f27be6: failed to fetch current csi config map: configmaps "ceph-csi-configs" is forbidden: User "system:serviceaccount:openshift-storage:ocs-client-operator-status-reporter" cannot get resource "configmaps" in API group "" in the namespace "openshift-storage" Containers: heartbeat: Container ID: cri-o://2fd250da9b45c40d71c6ee6f0b586477785e8d3ab7f5e69b72b8bf0f5f6fca1a Image: registry.redhat.io/odf4/ocs-client-rhel9-operator@sha256:56b046004516dd718892fab91b052ea5eb968101de12eaba97494a2f04d1fd59 Image ID: registry.redhat.io/odf4/ocs-client-rhel9-operator@sha256:56b046004516dd718892fab91b052ea5eb968101de12eaba97494a2f04d1fd59 Port: <none> Host Port: <none> Command: /status-reporter State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 1 Started: Tue, 04 Jun 2024 12:15:22 +0530 Finished: Tue, 04 Jun 2024 12:15:22 +0530 Ready: False Restart Count: 2 Environment: STORAGE_CLIENT_NAME: ocs-storagecluster OPERATOR_NAMESPACE: openshift-storage Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 39s default-scheduler Successfully assigned openshift-storage/storageclient-737342087af10580-status-reporter-28624725-ccvt6 to baremetal1-03 Normal AddedInterface 40s multus Add eth0 [10.130.1.88/23] from ovn-kubernetes Normal Pulled 24s (x3 over 40s) kubelet Container image "registry.redhat.io/odf4/ocs-client-rhel9-operator@sha256:56b046004516dd718892fab91b052ea5eb968101de12eaba97494a2f04d1fd59" already present on machine Normal Created 23s (x3 over 40s) kubelet Created container heartbeat Normal Started 23s (x3 over 40s) kubelet Started container heartbeat Warning BackOff 11s (x4 over 38s) kubelet Back-off restarting failed container heartbeat in pod storageclient-737342087af10580-status-reporter-28624725-ccvt6_openshift-storage(491a9b6a-064b-4058-94de-5da18c602a71) nberry@Nehas-MacBook-Pro 4-16-113 % failed to fetch current csi config map: configmaps "ceph-csi-configs" is forbidden Version-Release number of selected component (if applicable): ============================================================= 4.16-110 How reproducible: ================== Seen since past couple of days Steps to Reproduce: 1. 2. 3. Actual results: ==================== Status Reporter pod is in CLBO on the provider cluster with native client installed Expected results: ================== Additional info: storageclient-737342087af10580-status-reporter-28624722 0/1 5m 5m storageclient-737342087af10580-status-reporter-28624725 0/1 2m25s 2m25s oc describe job storageclient-737342087af10580-status-reporter-28624725 Name: storageclient-737342087af10580-status-reporter-28624725 Namespace: openshift-storage Selector: batch.kubernetes.io/controller-uid=ae6d9ff7-db8f-47b9-8714-d2811a1e6bc4 Labels: batch.kubernetes.io/controller-uid=ae6d9ff7-db8f-47b9-8714-d2811a1e6bc4 batch.kubernetes.io/job-name=storageclient-737342087af10580-status-reporter-28624725 controller-uid=ae6d9ff7-db8f-47b9-8714-d2811a1e6bc4 job-name=storageclient-737342087af10580-status-reporter-28624725 Annotations: batch.kubernetes.io/cronjob-scheduled-timestamp: 2024-06-04T06:45:00Z Controlled By: CronJob/storageclient-737342087af10580-status-reporter Parallelism: 1 Completions: 1 Completion Mode: NonIndexed Start Time: Tue, 04 Jun 2024 12:15:05 +0530 Active Deadline Seconds: 155s Pods Statuses: 0 Active (0 Ready) / 0 Succeeded / 2 Failed Pod Template: Labels: batch.kubernetes.io/controller-uid=ae6d9ff7-db8f-47b9-8714-d2811a1e6bc4 batch.kubernetes.io/job-name=storageclient-737342087af10580-status-reporter-28624725 controller-uid=ae6d9ff7-db8f-47b9-8714-d2811a1e6bc4 job-name=storageclient-737342087af10580-status-reporter-28624725 Service Account: ocs-client-operator-status-reporter Containers: heartbeat: Image: registry.redhat.io/odf4/ocs-client-rhel9-operator@sha256:56b046004516dd718892fab91b052ea5eb968101de12eaba97494a2f04d1fd59 Port: <none> Host Port: <none> Command: /status-reporter Environment: STORAGE_CLIENT_NAME: ocs-storagecluster OPERATOR_NAMESPACE: openshift-storage Mounts: <none> Volumes: <none> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulCreate 3m1s job-controller Created pod: storageclient-737342087af10580-status-reporter-28624725-ccvt6 Normal SuccessfulCreate 58s job-controller Created pod: storageclient-737342087af10580-status-reporter-28624725-slkpd Normal SuccessfulDelete 26s job-controller Deleted pod: storageclient-737342087af10580-status-reporter-28624725-slkpd Warning DeadlineExceeded 26s job-controller Job was active longer than specified deadline nberry@Nehas-MacBook-Pro 4-16-113 % Logs copied here - https://drive.google.com/file/d/1fJZtUSUaFdzr7-r5TzFSiXR7obuIOq1V/view?usp=drive_link UI Storageclient page https://docs.google.com/document/d/10Ir-cO6CLRwvpsyu_owsVotxCEGisKy43OsA-VqOSJ4/edit