Description of problem (please be detailed as possible and provide log snippests): Storageclassclaim is not getting created in a consumer cluster. Getting this error when creating storageclassclaim in openshift-storage namespace. $ oc get storageclassclaim -n openshift-storage NAME STORAGETYPE PHASE test-storageclassclaim-cephfs1 sharedfilesystem Configuring ocs-operator logs: {"level":"info","ts":1654078402.6853268,"logger":"controller.storageclassclaim","msg":"Reconciling StorageClassClaim.","reconciler group":"ocs.openshift.io","reconciler kind":"StorageClassClaim","name":"test-storageclassclaim-cephfs1","namespace":"openshift-storage","StorageClassClaim":"openshift-storage/test-storageclassclaim-cephfs1"} {"level":"info","ts":1654078402.6853898,"logger":"controller.storageclassclaim","msg":"Running StorageClassClaim controller in Consumer Mode","reconciler group":"ocs.openshift.io","reconciler kind":"StorageClassClaim","name":"test-storageclassclaim-cephfs1","namespace":"openshift-storage","StorageClassClaim":"openshift-storage/test-storageclassclaim-cephfs1"} {"level":"error","ts":1654078402.7412593,"logger":"controller.storageclassclaim","msg":"Reconciler error","reconciler group":"ocs.openshift.io","reconciler kind":"StorageClassClaim","name":"test-storageclassclaim-cephfs1","namespace":"openshift-storage","error":"failed to get StorageClassClaim config: rpc error: code = Unavailable desc = storage class claim \"test-storageclassclaim-cephfs1\" for \"56f2103b-264f-45cf-887d-edb7c2bffaae\" is in \"Creating\" phase","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"} must-gather logs - http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-j1-c1/jijoy-j1-c1_20220601T043509/logs/testcases_1654079214/ The Phase of the storage class claim is the same as reported initially in the bug #2089552. The error is different. Workaround is to respin ocs-operator pods on consumer and provider. =============================================================== Version of all relevant components (if applicable): $ oc get csv NAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.11.0 NooBaa Operator 4.11.0 mcg-operator.v4.10.2 Succeeded ocs-operator.v4.11.0 OpenShift Container Storage 4.11.0 ocs-operator.v4.10.2 Succeeded ocs-osd-deployer.v2.0.2 OCS OSD Deployer 2.0.2 ocs-osd-deployer.v2.0.1 Succeeded odf-csi-addons-operator.v4.11.0 CSI Addons 4.11.0 odf-csi-addons-operator.v4.10.2 Succeeded odf-operator.v4.11.0 OpenShift Data Foundation 4.11.0 odf-operator.v4.10.2 Succeeded ose-prometheus-operator.4.10.0 Prometheus Operator 4.10.0 ose-prometheus-operator.4.8.0 Succeeded route-monitor-operator.v0.1.418-6459408 Route Monitor Operator 0.1.418-6459408 route-monitor-operator.v0.1.408-c2256a2 Succeeded $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.15 True False 6h1m Cluster version is 4.10.15 $ oc get csv odf-operator.v4.11.0 -o yaml | grep full_version full_version: 4.11.0-85 =========================================================================== Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes. Cannot create StorageClassClaim. Is there any workaround available to the best of your knowledge? Respin ocs-operator pods on consumer and provider. Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? Yes Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Create StorageClassClaim on consumer cluster openshift-storage namespace using the yaml given below. apiVersion: ocs.openshift.io/v1alpha1 kind: StorageClassClaim metadata: name: test-storageclassclaim-cephfs1 spec: type: sharedfilesystem 2. Verify the phase of the StorageClassClaim Actual results: $ oc get storageclassclaim -n openshift-storage NAME STORAGETYPE PHASE test-storageclassclaim-cephfs1 sharedfilesystem Configuring Expected results: PHASE should be Ready. A new storage class should be created. Additional info:
Tested in ODF 4.11.0-90 Storageclassclaims with type "sharedfilesystem" are reaching Ready phase. But there are errors in th ocs-operator logs as described below. Storageclassclaims with type "blockpool" is not reaching Ready phase. There are errors in ocs-operator logs on both consumer and provider. ocs-operator pod on provider cluster got into CrashLoopBackOff state. From provider cluster: $ oc get pods -l name=ocs-operator NAME READY STATUS RESTARTS AGE ocs-operator-548d896c89-z7nmg 0/1 CrashLoopBackOff 39 (94s ago) 3h12m $ oc get csv ocs-operator.v4.11.0 NAME DISPLAY VERSION REPLACES PHASE ocs-operator.v4.11.0 OpenShift Container Storage 4.11.0 ocs-operator.v4.10.2 Installing ----------------------------------------- Output from consumer: $ oc get storageclassclaim -A NAMESPACE NAME STORAGETYPE PHASE openshift-storage test-storageclassclaim-cephfs sharedfilesystem Ready openshift-storage test-storageclassclaim-rbd blockpool Configuring test-project test-storageclassclaim-cephfs-test-project sharedfilesystem Ready test-project test-storageclassclaim-cephfs2-test-project sharedfilesystem Ready test-project test-storageclassclaim-rbd-test-project blockpool Configuring $ oc get storageclassclaim -n test-project -n openshift-storage NAME STORAGETYPE PHASE test-storageclassclaim-cephfs sharedfilesystem Ready test-storageclassclaim-rbd blockpool Configuring $ oc get sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE gp2 (default) kubernetes.io/aws-ebs Delete WaitForFirstConsumer true 24h gp2-csi ebs.csi.aws.com Delete WaitForFirstConsumer true 24h gp3-csi ebs.csi.aws.com Delete WaitForFirstConsumer true 24h ocs-storagecluster-ceph-rbd openshift-storage.rbd.csi.ceph.com Delete Immediate true 23h ocs-storagecluster-cephfs openshift-storage.cephfs.csi.ceph.com Delete Immediate true 23h test-storageclassclaim-cephfs openshift-storage.cephfs.csi.ceph.com Delete Immediate true 14h test-storageclassclaim-cephfs-test-project openshift-storage.cephfs.csi.ceph.com Delete Immediate true 14h test-storageclassclaim-cephfs2-test-project openshift-storage.cephfs.csi.ceph.com Delete Immediate true 2m24s ------------------------------------------- Output from provider: $ oc get storageclassclaim -A NAMESPACE NAME STORAGETYPE PHASE openshift-storage storageclassclaim-3cb007f76e37dde41384f05bace3a916 sharedfilesystem Ready openshift-storage storageclassclaim-54d0f4199966db79472bea3f118bd4bb sharedfilesystem Ready openshift-storage storageclassclaim-79ae51581e9ea848deddb135d3e7e324 blockpool openshift-storage storageclassclaim-84652ffb5666282fa23a5deb45b5e859 blockpool openshift-storage storageclassclaim-ccd98680a21d5146dbb28892ec353f87 sharedfilesystem Ready $ oc get sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE gp2 (default) kubernetes.io/aws-ebs Delete WaitForFirstConsumer true 28h gp2-csi ebs.csi.aws.com Delete WaitForFirstConsumer true 28h gp3-csi ebs.csi.aws.com Delete WaitForFirstConsumer true 28h $ oc get cephblockpools -A NAMESPACE NAME PHASE openshift-storage cephblockpool-storageconsumer-5efc53a1-aae7-4fe8-a3ad-e2ec9ea52777 Ready $ oc get cephfilesystem -A NAMESPACE NAME ACTIVEMDS AGE PHASE openshift-storage ocs-storagecluster-cephfilesystem 1 27h Ready ------------------------------------------------- Storageclassclaim with type "sharedfilesystem" are in Ready state.eg: test-storageclassclaim-cephfs. But many occurance of the error given below is present in the ocs-operator pod on consumer. {"level":"error","ts":1654716099.2675238,"logger":"controller.storageclassclaim","msg":"Reconciler error","reconciler group":"ocs.openshift.io","reconciler kind":"StorageClassClaim","name":"test-storageclassclaim-cephfs","namespace":"openshift-storage","error":"failed to get StorageClassClaim config: rpc error: code = Unavailable desc = status is not set for storage class claim \"test-storageclassclaim-cephfs\" for \"9c185091-d7ab-4b94-9d42-1180ae6fc809\"","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"} ---------------------------------------------- Storageclassclaim with type "blockpool" are not Ready. eg: test-storageclassclaim-rbd Error logs in ocs-operator of consumer: {"level":"error","ts":1654713853.2544885,"logger":"controller.storageclassclaim","msg":"Reconciler error","reconciler group":"ocs.openshift.io","reconciler kind":"StorageClassClaim","name":"test-storageclassclaim-rbd","namespace":"openshift-storage","error":"failed to get StorageClassClaim config: rpc error: code = Unavailable desc = status is not set for storage class claim \"test-storageclassclaim-rbd\" for \"9c185091-d7ab-4b94-9d42-1180ae6fc809\"","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"} Error logs in the ocs-operator of provider: {"level":"info","ts":1654770544.472081,"logger":"controller.storageclassclaim","msg":"Reconciling StorageClassClaim.","reconciler group":"ocs.openshift.io","reconciler kind":"StorageClassClaim","name":"storageclassclaim-54d0f4199966db79472bea3f118bd4bb","namespace":"openshift-storage","StorageClassClaim":"openshift-storage/storageclassclaim-54d0f4199966db79472bea3f118bd4bb"} {"level":"info","ts":1654770544.472141,"logger":"controller.storageclassclaim","msg":"Running StorageClassClaim controller in Converged/Provider Mode","reconciler group":"ocs.openshift.io","reconciler kind":"StorageClassClaim","name":"storageclassclaim-54d0f4199966db79472bea3f118bd4bb","namespace":"openshift-storage","StorageClassClaim":"openshift-storage/storageclassclaim-54d0f4199966db79472bea3f118bd4bb"} {"level":"info","ts":1654770544.662482,"logger":"controllers.OCSInitialization","msg":"Updating SecurityContextConstraint.","Request.Namespace":"openshift-storage","Request.Name":"ocsinit","SecurityContextConstraint":{"name":"ocs-metrics-exporter"}} {"level":"info","ts":1654770544.662524,"logger":"controller.storageclassclaim","msg":"Reconciling StorageClassClaim.","reconciler group":"ocs.openshift.io","reconciler kind":"StorageClassClaim","name":"storageclassclaim-79ae51581e9ea848deddb135d3e7e324","namespace":"openshift-storage","StorageClassClaim":"openshift-storage/storageclassclaim-79ae51581e9ea848deddb135d3e7e324"} {"level":"info","ts":1654770544.6625803,"logger":"controller.storageclassclaim","msg":"Running StorageClassClaim controller in Converged/Provider Mode","reconciler group":"ocs.openshift.io","reconciler kind":"StorageClassClaim","name":"storageclassclaim-79ae51581e9ea848deddb135d3e7e324","namespace":"openshift-storage","StorageClassClaim":"openshift-storage/storageclassclaim-79ae51581e9ea848deddb135d3e7e324"} panic: assignment to entry in nil map ---------------------------------------------- must-gather logs from provider cluster: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-j8-pr/jijoy-j8-pr_20220608T054725/logs/testcases_1654769209/ must-gather logs from consumer cluster: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-j8-cr/jijoy-j8-cr_20220608T095739/logs/testcases_1654769180/ ========================================================= Version: ocs-operator.v4.11.0 ocs-osd-deployer.v2.0.2 odf-csi-addons-operator.v4.11.0 odf-operator.v4.11.0 ODF full version 4.11.0-90 OCP 4.10.16
Changing the bug status to assigned because the issue is not completely fixed. Please let me know if we need to open a new bug.
(In reply to Jilju Joy from comment #6) > Changing the bug status to assigned because the issue is not completely > fixed. Please let me know if we need to open a new bug. The panic shouldn't happen, we are testing it in another cluster to confirm it. @Jilju can you please also check on some other cluster?
removed the pr since the pr was from master.
Verified in ODF version 4.11.0-98. ocs-osd-deployer.v2.0.2 Created storageclassclaim on consumer cluster. Storageclassclaim and storage class created successfully. Storageclassclaim was created automatically on the provider cluster. Verified PVC creation, pod creation and I/O. Tested both RBD and CephFS. List of storageclass: test-storageclassclaim-2cephfs-test-project openshift-storage.cephfs.csi.ceph.com Delete Immediate true 25m test-storageclassclaim-2rbd-test-project openshift-storage.rbd.csi.ceph.com Delete Immediate true 17m test-storageclassclaim-rbd-test-project openshift-storage.rbd.csi.ceph.com Delete Immediate true 23m
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.11.0 security, enhancement, & bugfix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:6156