Bug 2254216

Summary: [Provider-Client deployment] storageclient-status-reporter CLBO. storageclass claims stuck in configuring
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Daniel Osypenko <dosypenk>
Component: ocs-client-operatorAssignee: Leela Venkaiah Gangavarapu <lgangava>
Status: CLOSED ERRATA QA Contact: Daniel Osypenko <dosypenk>
Severity: high Docs Contact:
Priority: high    
Version: 4.14CC: lgangava, muagarwa, nigoyal, odf-bz-bot, omitrani, resoni
Target Milestone: ---   
Target Release: ODF 4.15.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: isf-provider
Fixed In Version: 4.15.0-136 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-03-19 15:29:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Daniel Osypenko 2023-12-12 17:51:31 UTC
Description of problem:

Deploying setup with ODF 4.14.2-1 storageclient-e12669861f4f0a87-status-reporter got CrashLoopBackOff when applied StorageClient, storageclassclaim stuck in Configuring

The issue blocks the deployment process, no workaround found

> oc get pod -n openshift-storage-client
NAME                                                            READY   STATUS                   RESTARTS        AGE
console-7679f44d76-lqf42                                        1/1     Running                  0               19m
csi-addons-controller-manager-58dc98dd5-hpxx2                   2/2     Running                  0               19m
ocs-client-operator-console-7679f44d76-zsr8t                    1/1     Running                  0               19m
ocs-client-operator-controller-manager-7c65b7b5f-g2tfh          2/2     Running                  0               19m
storageclient-e12669861f4f0a87-status-reporter-28373221-lkjlf   0/1     ContainerStatusUnknown   4 (2m24s ago)   3m58s
storageclient-e12669861f4f0a87-status-reporter-28373224-gxz4k   0/1     CrashLoopBackOff         3 (37s ago)     83slogsoc logs storageclient-e12669861f4f0a87-status-reporter-28373226-fv5x5 -n openshift-storage-client
F1212 15:08:33.799541       1 main.go:166] Failed to update mon configmap for storageClient 17d7cd6d-e87f-4495-b7d7-a24172fd0dbb: failed to fetch current csi config map: configmaps "ceph-csi-configs" not found

storage class claims stuck in Configuring 

> oc get StorageClassClaims -n openshift-storage-client -w

NAME                          STORAGETYPE        STORAGEPROFILE   STORAGECLIENTNAME   STORAGECLIENTNAMESPACE     PHASE
ocs-storagecluster-ceph-rbd   blockpool                           storage-client      openshift-storage-client   Configuring
ocs-storagecluster-cephfs     sharedfilesystem                    storage-client      openshift-storage-client   Configuring


> oc logs rook-ceph-operator-659dfd5cd7-68z7z -n openshift-storage

 ceph-client-controller: failed to set ceph client "openshift-storage/a99dd2954587ecf99aef18111acb98cd" status to "Progressing". failed to update object "openshift-storage/a99dd2954587ecf99aef18111acb98cd" status: Operation cannot be fulfilled on cephclients.ceph.rook.io "a99dd2954587ecf99aef18111acb98cd": the object has been modified; please apply your changes to the latest version and try again

Version-Release number of selected component (if applicable):

OCP 4.15.0-ec.2
ODF 4.14.2-1

How reproducible:
deploy Provider see steps for the particular setup - https://docs.google.com/document/d/14ivvbdHp-p1Vn9Y80RIK7P7qq39jrYaYoyt1kuTWc_Y/edit?usp=sharing

Steps to Reproduce:
1. prepare Provider
2. subscribe client (client Connected)
3. create StorageClassClaim for rbd and cephfs sc's

Actual results:
storageclient-e12669861f4f0a87-status-reporter-28373378-265lk pod has status CLBO
if to proceed with storageClassClaims both rbd and cephfs are not getting Ready and stuck in Configuring state.
Though StorageClasses are getting created on Provider, PVCs using the storageClasses are getting stuck in Pending state 

Expected results:
storageclient-e12669861f4f0a87-status-reporter-28373378-265lk is Running
storageClassClaims became Ready
PVCs are getting Ready

Additional info:

After StorageClient removal and creating it again issue reproduced 3 times.

This is first setup in practice when StorageProfile has pg_autoscale_mode "On"

apiVersion: ocs.openshift.io/v1
kind: StorageProfile
metadata:
  labels:
    app.kubernetes.io/name: storageprofile
    app.kubernetes.io/instance: ssd-storageprofile
    app.kubernetes.io/part-of: ocs-operator
    app.kubernetes.io/managed-by:  ocs-operator
  name: ssd-storageprofile
  namespace: openshift-storage
spec:
  deviceClass: ssd
  blockPoolConfiguration:
     pg_autoscale_mode: "on"
  sharedFilesystemConfiguration:
    pg_autoscale_mode: "on"

must-gather: https://drive.google.com/drive/folders/1RIwz33HXc9QAZrNNeVwMX_BytTE4jSsp?usp=sharing

Comment 5 Daniel Osypenko 2023-12-13 08:59:39 UTC
Thanks Leela. I have followed your instructions, changing OCP version to v4.15.0-ec.2 in cm and reinstalled client.
Now list of pods on client's namespace is

oc get pod -n openshift-storage-client
NAME                                                            READY   STATUS      RESTARTS   AGE
console-7679f44d76-lqf42                                        1/1     Running     0          18h
csi-addons-controller-manager-58dc98dd5-hpxx2                   2/2     Running     0          18h
csi-cephfsplugin-27dn8                                          2/2     Running     0          96s
csi-cephfsplugin-7xrg5                                          2/2     Running     0          96s
csi-cephfsplugin-cmfnq                                          2/2     Running     0          96s
csi-cephfsplugin-fb5xs                                          2/2     Running     0          96s
csi-cephfsplugin-provisioner-549555965d-gq7hv                   5/5     Running     0          96s
csi-cephfsplugin-provisioner-549555965d-smvnk                   5/5     Running     0          96s
csi-cephfsplugin-qcnbj                                          2/2     Running     0          96s
csi-cephfsplugin-xh49c                                          2/2     Running     0          96s
csi-rbdplugin-569pq                                             3/3     Running     0          96s
csi-rbdplugin-77wvd                                             3/3     Running     0          96s
csi-rbdplugin-fccbr                                             3/3     Running     0          96s
csi-rbdplugin-l5sz2                                             3/3     Running     0          96s
csi-rbdplugin-l9ng4                                             3/3     Running     0          96s
csi-rbdplugin-nbzww                                             3/3     Running     0          96s
csi-rbdplugin-provisioner-64df688f89-d8mwp                      5/5     Running     0          96s
csi-rbdplugin-provisioner-64df688f89-tcnvm                      5/5     Running     0          96s
ocs-client-operator-console-7679f44d76-zsr8t                    1/1     Running     0          18h
ocs-client-operator-controller-manager-7c65b7b5f-g2tfh          2/2     Running     0          18h
storageclient-e12669861f4f0a87-status-reporter-28374296-gkkgg   0/1     Completed   0          38s

Comment 12 Daniel Osypenko 2024-02-14 20:50:17 UTC
bug verified on 
OCP 4.15.0-0.nightly-2024-01-25-051548
ODF odf-operator.v4.14.5-8.stable

detailed steps:
https://docs.google.com/document/d/1DNrAWjH8Pn89EcX02HH_UqFSJrD54yun6Nn8RkuCBkU/edit?usp=sharing

Comment 14 errata-xmlrpc 2024-03-19 15:29:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:1383