2254216 – [Provider-Client deployment] storageclient-status-reporter CLBO. storageclass claims stuck in configuring

Bug 2254216 - [Provider-Client deployment] storageclient-status-reporter CLBO. storageclass claims stuck in configuring

Summary: [Provider-Client deployment] storageclient-status-reporter CLBO. storageclass...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	ocs-client-operator
Sub Component:
Version:	4.14
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	ODF 4.15.0
Assignee:	Leela Venkaiah Gangavarapu
QA Contact:	Daniel Osypenko
Docs Contact:
URL:
Whiteboard:	isf-provider
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2023-12-12 17:51 UTC by Daniel Osypenko
Modified:	2024-03-19 15:29 UTC (History)
CC List:	6 users (show)
Fixed In Version:	4.15.0-136
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2024-03-19 15:29:39 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	red-hat-storage ocs-client-operator pull 65	None	open	controllers: pick next best version of csi images	2024-02-01 07:55:41 UTC
Github	red-hat-storage ocs-client-operator pull 69	None	open	Bug 2254216: [release-4.15] controllers: pick next best version of csi images	2024-02-06 09:33:51 UTC
Red Hat Product Errata	RHSA-2024:1383	None	None	None	2024-03-19 15:29:42 UTC

Description Daniel Osypenko 2023-12-12 17:51:31 UTC

Description of problem:

Deploying setup with ODF 4.14.2-1 storageclient-e12669861f4f0a87-status-reporter got CrashLoopBackOff when applied StorageClient, storageclassclaim stuck in Configuring

The issue blocks the deployment process, no workaround found

> oc get pod -n openshift-storage-client
NAME                                                            READY   STATUS                   RESTARTS        AGE
console-7679f44d76-lqf42                                        1/1     Running                  0               19m
csi-addons-controller-manager-58dc98dd5-hpxx2                   2/2     Running                  0               19m
ocs-client-operator-console-7679f44d76-zsr8t                    1/1     Running                  0               19m
ocs-client-operator-controller-manager-7c65b7b5f-g2tfh          2/2     Running                  0               19m
storageclient-e12669861f4f0a87-status-reporter-28373221-lkjlf   0/1     ContainerStatusUnknown   4 (2m24s ago)   3m58s
storageclient-e12669861f4f0a87-status-reporter-28373224-gxz4k   0/1     CrashLoopBackOff         3 (37s ago)     83slogsoc logs storageclient-e12669861f4f0a87-status-reporter-28373226-fv5x5 -n openshift-storage-client
F1212 15:08:33.799541       1 main.go:166] Failed to update mon configmap for storageClient 17d7cd6d-e87f-4495-b7d7-a24172fd0dbb: failed to fetch current csi config map: configmaps "ceph-csi-configs" not found

storage class claims stuck in Configuring 

> oc get StorageClassClaims -n openshift-storage-client -w

NAME                          STORAGETYPE        STORAGEPROFILE   STORAGECLIENTNAME   STORAGECLIENTNAMESPACE     PHASE
ocs-storagecluster-ceph-rbd   blockpool                           storage-client      openshift-storage-client   Configuring
ocs-storagecluster-cephfs     sharedfilesystem                    storage-client      openshift-storage-client   Configuring


> oc logs rook-ceph-operator-659dfd5cd7-68z7z -n openshift-storage

 ceph-client-controller: failed to set ceph client "openshift-storage/a99dd2954587ecf99aef18111acb98cd" status to "Progressing". failed to update object "openshift-storage/a99dd2954587ecf99aef18111acb98cd" status: Operation cannot be fulfilled on cephclients.ceph.rook.io "a99dd2954587ecf99aef18111acb98cd": the object has been modified; please apply your changes to the latest version and try again

Version-Release number of selected component (if applicable):

OCP 4.15.0-ec.2
ODF 4.14.2-1

How reproducible:
deploy Provider see steps for the particular setup - https://docs.google.com/document/d/14ivvbdHp-p1Vn9Y80RIK7P7qq39jrYaYoyt1kuTWc_Y/edit?usp=sharing

Steps to Reproduce:
1. prepare Provider
2. subscribe client (client Connected)
3. create StorageClassClaim for rbd and cephfs sc's

Actual results:
storageclient-e12669861f4f0a87-status-reporter-28373378-265lk pod has status CLBO
if to proceed with storageClassClaims both rbd and cephfs are not getting Ready and stuck in Configuring state.
Though StorageClasses are getting created on Provider, PVCs using the storageClasses are getting stuck in Pending state 

Expected results:
storageclient-e12669861f4f0a87-status-reporter-28373378-265lk is Running
storageClassClaims became Ready
PVCs are getting Ready

Additional info:

After StorageClient removal and creating it again issue reproduced 3 times.

This is first setup in practice when StorageProfile has pg_autoscale_mode "On"

apiVersion: ocs.openshift.io/v1
kind: StorageProfile
metadata:
  labels:
    app.kubernetes.io/name: storageprofile
    app.kubernetes.io/instance: ssd-storageprofile
    app.kubernetes.io/part-of: ocs-operator
    app.kubernetes.io/managed-by:  ocs-operator
  name: ssd-storageprofile
  namespace: openshift-storage
spec:
  deviceClass: ssd
  blockPoolConfiguration:
     pg_autoscale_mode: "on"
  sharedFilesystemConfiguration:
    pg_autoscale_mode: "on"

must-gather: https://drive.google.com/drive/folders/1RIwz33HXc9QAZrNNeVwMX_BytTE4jSsp?usp=sharing

Comment 5 Daniel Osypenko 2023-12-13 08:59:39 UTC

Thanks Leela. I have followed your instructions, changing OCP version to v4.15.0-ec.2 in cm and reinstalled client.
Now list of pods on client's namespace is

oc get pod -n openshift-storage-client
NAME                                                            READY   STATUS      RESTARTS   AGE
console-7679f44d76-lqf42                                        1/1     Running     0          18h
csi-addons-controller-manager-58dc98dd5-hpxx2                   2/2     Running     0          18h
csi-cephfsplugin-27dn8                                          2/2     Running     0          96s
csi-cephfsplugin-7xrg5                                          2/2     Running     0          96s
csi-cephfsplugin-cmfnq                                          2/2     Running     0          96s
csi-cephfsplugin-fb5xs                                          2/2     Running     0          96s
csi-cephfsplugin-provisioner-549555965d-gq7hv                   5/5     Running     0          96s
csi-cephfsplugin-provisioner-549555965d-smvnk                   5/5     Running     0          96s
csi-cephfsplugin-qcnbj                                          2/2     Running     0          96s
csi-cephfsplugin-xh49c                                          2/2     Running     0          96s
csi-rbdplugin-569pq                                             3/3     Running     0          96s
csi-rbdplugin-77wvd                                             3/3     Running     0          96s
csi-rbdplugin-fccbr                                             3/3     Running     0          96s
csi-rbdplugin-l5sz2                                             3/3     Running     0          96s
csi-rbdplugin-l9ng4                                             3/3     Running     0          96s
csi-rbdplugin-nbzww                                             3/3     Running     0          96s
csi-rbdplugin-provisioner-64df688f89-d8mwp                      5/5     Running     0          96s
csi-rbdplugin-provisioner-64df688f89-tcnvm                      5/5     Running     0          96s
ocs-client-operator-console-7679f44d76-zsr8t                    1/1     Running     0          18h
ocs-client-operator-controller-manager-7c65b7b5f-g2tfh          2/2     Running     0          18h
storageclient-e12669861f4f0a87-status-reporter-28374296-gkkgg   0/1     Completed   0          38s

Comment 12 Daniel Osypenko 2024-02-14 20:50:17 UTC

bug verified on 
OCP 4.15.0-0.nightly-2024-01-25-051548
ODF odf-operator.v4.14.5-8.stable

detailed steps:
https://docs.google.com/document/d/1DNrAWjH8Pn89EcX02HH_UqFSJrD54yun6Nn8RkuCBkU/edit?usp=sharing

Comment 14 errata-xmlrpc 2024-03-19 15:29:39 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:1383

Note You need to log in before you can comment on or make changes to this bug.