Description of problem: RBD PVCs can not be created on fresh clusters with 8 or 10 size option specified. They end up with an error: InvalidArgument desc = failed to fetch monitor list using clusterID (openshift-storage): missing configuration for cluster ID "openshift-storage" E.g.: Name: pvc-test-a83f3365ad834bf996cd793c7a39b20 Namespace: namespace-test-2984fe87e8ec4534907c7ba73 StorageClass: ocs-storagecluster-ceph-rbd Status: Pending Volume: Labels: <none> Annotations: volume.beta.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com volume.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com Finalizers: [kubernetes.io/pvc-protection] Capacity: Access Modes: VolumeMode: Filesystem Used By: <none> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Provisioning 31s (x7 over 62s) openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-65477c4f5-9kk8w_77954ce8-d024-42cc-9b93-1f58d176f537 External provisioner is provisioning volume for claim "namespace-test-2984fe87e8ec4534907c7ba73/pvc-test-a83f3365ad834bf996cd793c7a39b20" Warning ProvisioningFailed 31s (x7 over 62s) openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-65477c4f5-9kk8w_77954ce8-d024-42cc-9b93-1f58d176f537 failed to provision volume with StorageClass "ocs-storagecluster-ceph-rbd": rpc error: code = InvalidArgument desc = failed to fetch monitor list using clusterID (openshift-storage): missing configuration for cluster ID "openshift-storage" Normal ExternalProvisioning 8s (x5 over 62s) persistentvolume-controller waiting for a volume to be created, either by external provisioner "openshift-storage.rbd.csi.ceph.com" or manually created by system administrator Version-Release number of selected component (if applicable): ocs-operator.v4.10.8 How reproducible: 4/4 Steps to Reproduce: 1. Deploy a provider cluster with size 8 or 20 and a consumer. 2. Create a PVC on consumer that that uses RBD storageclass. Actual results: PVC is stuck in Pending state with an error: InvalidArgument desc = failed to fetch monitor list using clusterID (openshift-storage): missing configuration for cluster ID "openshift-storage" Expected results: PVC is created. Additional info: There is a workaround: restart rook-ceph-operator Discussion about the issue: https://chat.google.com/room/AAAASHA9vWs/H_U9EtJfcPQ
Hi, I faced the exact issue in my setup with PVC stuck in pending state. I fixed the issue by restarting the rook-ceph-operator. The following error message was seen in describe pvc: Warning ProvisioningFailed 85m (x14 over 104m) openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-65477c4f5-9x4s7_286be8d2-73a0-4410-b8ed-d1ce7d4e187e failed to provision volume with StorageClass "ocs-storagecluster-ceph-rbd": rpc error: code = InvalidArgument desc = failed to fetch monitor list using clusterID (openshift-storage): missing configuration for cluster ID "openshift-storage" Here are the version of various components in my consumer cluster: ocos get csv NAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.10.9 NooBaa Operator 4.10.9 mcg-operator.v4.10.8 Succeeded observability-operator.v0.0.20 Observability Operator 0.0.20 observability-operator.v0.0.19 Succeeded ocs-operator.v4.10.9 OpenShift Container Storage 4.10.9 ocs-operator.v4.10.8 Succeeded ocs-osd-deployer.v2.0.11 OCS OSD Deployer 2.0.11-7 ocs-osd-deployer.v2.0.10 Succeeded odf-csi-addons-operator.v4.10.9 CSI Addons 4.10.9 odf-csi-addons-operator.v4.10.8 Succeeded odf-operator.v4.10.9 OpenShift Data Foundation 4.10.9 odf-operator.v4.10.8 Succeeded ose-prometheus-operator.4.10.0 Prometheus Operator 4.10.0 ose-prometheus-operator.4.8.0 Succeeded route-monitor-operator.v0.1.461-dbddf1f Route Monitor Operator 0.1.461-dbddf1f route-monitor-operator.v0.1.456-02ea942 Succeeded oc version Client Version: 4.12.0 Kustomize Version: v4.5.7 Server Version: 4.10.45 Kubernetes Version: v1.23.12+8a6bfe4
(In reply to Shekhar Berry from comment #1) > Hi, > > I faced the exact issue in my setup with PVC stuck in pending state. I fixed > the issue by restarting the rook-ceph-operator. > > The following error message was seen in describe pvc: > > Warning ProvisioningFailed 85m (x14 over 104m) > openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-65477c4f5- > 9x4s7_286be8d2-73a0-4410-b8ed-d1ce7d4e187e failed to provision volume with > StorageClass "ocs-storagecluster-ceph-rbd": rpc error: code = > InvalidArgument desc = failed to fetch monitor list using clusterID > (openshift-storage): missing configuration for cluster ID "openshift-storage" > > > Here are the version of various components in my consumer cluster: > > ocos get csv > NAME DISPLAY > VERSION REPLACES PHASE > mcg-operator.v4.10.9 NooBaa Operator > 4.10.9 mcg-operator.v4.10.8 Succeeded > observability-operator.v0.0.20 Observability Operator > 0.0.20 observability-operator.v0.0.19 Succeeded > ocs-operator.v4.10.9 OpenShift Container Storage > 4.10.9 ocs-operator.v4.10.8 Succeeded > ocs-osd-deployer.v2.0.11 OCS OSD Deployer > 2.0.11-7 ocs-osd-deployer.v2.0.10 Succeeded > odf-csi-addons-operator.v4.10.9 CSI Addons > 4.10.9 odf-csi-addons-operator.v4.10.8 Succeeded > odf-operator.v4.10.9 OpenShift Data Foundation > 4.10.9 odf-operator.v4.10.8 Succeeded > ose-prometheus-operator.4.10.0 Prometheus Operator > 4.10.0 ose-prometheus-operator.4.8.0 Succeeded > route-monitor-operator.v0.1.461-dbddf1f Route Monitor Operator > 0.1.461-dbddf1f route-monitor-operator.v0.1.456-02ea942 Succeeded > > oc version > Client Version: 4.12.0 > Kustomize Version: v4.5.7 > Server Version: 4.10.45 > Kubernetes Version: v1.23.12+8a6bfe4 This was seen with 4TiB cluster, FYI. My setup consisted of 1 Provider and 3 consumers and it was seen in just one consumer and other 2 worked fine out of the box.
Yash can you please ACK if the SRE workaround is acceptable from your perspective?
Closing this as won't Fix as we have a workaround for it, that is to restart the rook-ceph-operator.