Bug 2290326 - Status-Reporter Pod in Error/CLBO -failed to fetch current csi config map: configmaps "ceph-csi-configs" is forbidden
Summary: Status-Reporter Pod in Error/CLBO -failed to fetch current csi config map: co...
Keywords:
Status: CLOSED DUPLICATE of bug 2278593
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ocs-client-operator
Version: 4.16
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Madhu Rajanna
QA Contact: Daniel Osypenko
URL:
Whiteboard: isf-provider
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-06-04 06:50 UTC by Neha Berry
Modified: 2024-06-04 07:07 UTC (History)
4 users (show)

Fixed In Version: 4.16.0-118
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-06-04 07:07:26 UTC
Embargoed:


Attachments (Terms of Use)

Description Neha Berry 2024-06-04 06:50:43 UTC
Description of problem:
==============================

On one of our long running BM clusters, where we have been performing multiple tasks, like failures, Key Rotation etc, it is observed that the Status reporter pod is in CLBO since multiple days, OCP clients, HCP clients etc

>>

oc get pods -o wide|grep status
storageclient-737342087af10580-status-reporter-28624719-ftc49     0/1     ContainerStatusUnknown   4 (3m21s ago)   4m49s   10.130.1.83    baremetal1-03   <none>           <none>
storageclient-737342087af10580-status-reporter-28624722-8d74p     0/1     ContainerStatusUnknown   4 (44s ago)     2m14s   10.130.1.86    baremetal1-03   <none>           <none>
storageclient-737342087af10580-status-reporter-28624722-9qzfg     0/1     CrashLoopBackOff         1 (9s ago)      11s     10.130.1.87    baremetal1-03   <none>           <none>


>> date ; oc logs storageclient-737342087af10580-status-reporter-28624722-9qzfg
Tue Jun  4 12:15:02 IST 2024
W0604 06:44:52.170711       1 main.go:160] Failed to get clusterDNS "cluster": dnses.config.openshift.io "cluster" is forbidden: User "system:serviceaccount:openshift-storage:ocs-client-operator-status-reporter" cannot get resource "dnses" in API group "config.openshift.io" at the cluster scope
W0604 06:44:52.170828       1 main.go:164] Cluster Base Domain is empty.
F0604 06:44:52.233151       1 main.go:142] Failed to update mon configmap for storageClient 85d4a535-2380-4bda-a42c-727777f27be6: failed to fetch current csi config map: configmaps "ceph-csi-configs" is forbidden: User "system:serviceaccount:openshift-storage:ocs-client-operator-status-reporter" cannot get resource "configmaps" in API group "" in the namespace "openshift-storage"


Containers:
  heartbeat:
    Container ID:  cri-o://2fd250da9b45c40d71c6ee6f0b586477785e8d3ab7f5e69b72b8bf0f5f6fca1a
    Image:         registry.redhat.io/odf4/ocs-client-rhel9-operator@sha256:56b046004516dd718892fab91b052ea5eb968101de12eaba97494a2f04d1fd59
    Image ID:      registry.redhat.io/odf4/ocs-client-rhel9-operator@sha256:56b046004516dd718892fab91b052ea5eb968101de12eaba97494a2f04d1fd59
    Port:          <none>
    Host Port:     <none>
    Command:
      /status-reporter
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 04 Jun 2024 12:15:22 +0530
      Finished:     Tue, 04 Jun 2024 12:15:22 +0530
    Ready:          False
    Restart Count:  2
    Environment:
      STORAGE_CLIENT_NAME:  ocs-storagecluster
      OPERATOR_NAMESPACE:   openshift-storage


Events:
  Type     Reason          Age                From               Message
  ----     ------          ----               ----               -------
  Normal   Scheduled       39s                default-scheduler  Successfully assigned openshift-storage/storageclient-737342087af10580-status-reporter-28624725-ccvt6 to baremetal1-03
  Normal   AddedInterface  40s                multus             Add eth0 [10.130.1.88/23] from ovn-kubernetes
  Normal   Pulled          24s (x3 over 40s)  kubelet            Container image "registry.redhat.io/odf4/ocs-client-rhel9-operator@sha256:56b046004516dd718892fab91b052ea5eb968101de12eaba97494a2f04d1fd59" already present on machine
  Normal   Created         23s (x3 over 40s)  kubelet            Created container heartbeat
  Normal   Started         23s (x3 over 40s)  kubelet            Started container heartbeat
  Warning  BackOff         11s (x4 over 38s)  kubelet            Back-off restarting failed container heartbeat in pod storageclient-737342087af10580-status-reporter-28624725-ccvt6_openshift-storage(491a9b6a-064b-4058-94de-5da18c602a71)
nberry@Nehas-MacBook-Pro 4-16-113 % 





failed to fetch current csi config map: configmaps "ceph-csi-configs" is forbidden

Version-Release number of selected component (if applicable):
=============================================================
4.16-110

How reproducible:
==================
Seen since past couple of days

Steps to Reproduce:
1.
2.
3.

Actual results:
====================
Status Reporter pod is in CLBO on the provider cluster with native client installed

Expected results:
==================

Additional info:


storageclient-737342087af10580-status-reporter-28624722   0/1           5m         5m
storageclient-737342087af10580-status-reporter-28624725   0/1           2m25s      2m25s

oc describe job storageclient-737342087af10580-status-reporter-28624725
Name:                     storageclient-737342087af10580-status-reporter-28624725
Namespace:                openshift-storage
Selector:                 batch.kubernetes.io/controller-uid=ae6d9ff7-db8f-47b9-8714-d2811a1e6bc4
Labels:                   batch.kubernetes.io/controller-uid=ae6d9ff7-db8f-47b9-8714-d2811a1e6bc4
                          batch.kubernetes.io/job-name=storageclient-737342087af10580-status-reporter-28624725
                          controller-uid=ae6d9ff7-db8f-47b9-8714-d2811a1e6bc4
                          job-name=storageclient-737342087af10580-status-reporter-28624725
Annotations:              batch.kubernetes.io/cronjob-scheduled-timestamp: 2024-06-04T06:45:00Z
Controlled By:            CronJob/storageclient-737342087af10580-status-reporter
Parallelism:              1
Completions:              1
Completion Mode:          NonIndexed
Start Time:               Tue, 04 Jun 2024 12:15:05 +0530
Active Deadline Seconds:  155s
Pods Statuses:            0 Active (0 Ready) / 0 Succeeded / 2 Failed
Pod Template:
  Labels:           batch.kubernetes.io/controller-uid=ae6d9ff7-db8f-47b9-8714-d2811a1e6bc4
                    batch.kubernetes.io/job-name=storageclient-737342087af10580-status-reporter-28624725
                    controller-uid=ae6d9ff7-db8f-47b9-8714-d2811a1e6bc4
                    job-name=storageclient-737342087af10580-status-reporter-28624725
  Service Account:  ocs-client-operator-status-reporter
  Containers:
   heartbeat:
    Image:      registry.redhat.io/odf4/ocs-client-rhel9-operator@sha256:56b046004516dd718892fab91b052ea5eb968101de12eaba97494a2f04d1fd59
    Port:       <none>
    Host Port:  <none>
    Command:
      /status-reporter
    Environment:
      STORAGE_CLIENT_NAME:  ocs-storagecluster
      OPERATOR_NAMESPACE:   openshift-storage
    Mounts:                 <none>
  Volumes:                  <none>
Events:
  Type     Reason            Age   From            Message
  ----     ------            ----  ----            -------
  Normal   SuccessfulCreate  3m1s  job-controller  Created pod: storageclient-737342087af10580-status-reporter-28624725-ccvt6
  Normal   SuccessfulCreate  58s   job-controller  Created pod: storageclient-737342087af10580-status-reporter-28624725-slkpd
  Normal   SuccessfulDelete  26s   job-controller  Deleted pod: storageclient-737342087af10580-status-reporter-28624725-slkpd
  Warning  DeadlineExceeded  26s   job-controller  Job was active longer than specified deadline
nberry@Nehas-MacBook-Pro 4-16-113 % 



Logs copied here - https://drive.google.com/file/d/1fJZtUSUaFdzr7-r5TzFSiXR7obuIOq1V/view?usp=drive_link

UI Storageclient page https://docs.google.com/document/d/10Ir-cO6CLRwvpsyu_owsVotxCEGisKy43OsA-VqOSJ4/edit


Note You need to log in before you can comment on or make changes to this bug.