Bug 2166900
| Summary: | RBD PVCs are not working with 8 TiB and 20 TiB clusters | |||
|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Filip Balák <fbalak> | |
| Component: | odf-managed-service | Assignee: | Rewant <resoni> | |
| Status: | CLOSED WONTFIX | QA Contact: | Filip Balák <fbalak> | |
| Severity: | high | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 4.10 | CC: | cblum, nberry, ocs-bugs, odf-bz-bot, shberry, ykukreja | |
| Target Milestone: | --- | |||
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2172101 (view as bug list) | Environment: | ||
| Last Closed: | 2023-07-03 14:16:44 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2172101 | |||
Hi, I faced the exact issue in my setup with PVC stuck in pending state. I fixed the issue by restarting the rook-ceph-operator. The following error message was seen in describe pvc: Warning ProvisioningFailed 85m (x14 over 104m) openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-65477c4f5-9x4s7_286be8d2-73a0-4410-b8ed-d1ce7d4e187e failed to provision volume with StorageClass "ocs-storagecluster-ceph-rbd": rpc error: code = InvalidArgument desc = failed to fetch monitor list using clusterID (openshift-storage): missing configuration for cluster ID "openshift-storage" Here are the version of various components in my consumer cluster: ocos get csv NAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.10.9 NooBaa Operator 4.10.9 mcg-operator.v4.10.8 Succeeded observability-operator.v0.0.20 Observability Operator 0.0.20 observability-operator.v0.0.19 Succeeded ocs-operator.v4.10.9 OpenShift Container Storage 4.10.9 ocs-operator.v4.10.8 Succeeded ocs-osd-deployer.v2.0.11 OCS OSD Deployer 2.0.11-7 ocs-osd-deployer.v2.0.10 Succeeded odf-csi-addons-operator.v4.10.9 CSI Addons 4.10.9 odf-csi-addons-operator.v4.10.8 Succeeded odf-operator.v4.10.9 OpenShift Data Foundation 4.10.9 odf-operator.v4.10.8 Succeeded ose-prometheus-operator.4.10.0 Prometheus Operator 4.10.0 ose-prometheus-operator.4.8.0 Succeeded route-monitor-operator.v0.1.461-dbddf1f Route Monitor Operator 0.1.461-dbddf1f route-monitor-operator.v0.1.456-02ea942 Succeeded oc version Client Version: 4.12.0 Kustomize Version: v4.5.7 Server Version: 4.10.45 Kubernetes Version: v1.23.12+8a6bfe4 (In reply to Shekhar Berry from comment #1) > Hi, > > I faced the exact issue in my setup with PVC stuck in pending state. I fixed > the issue by restarting the rook-ceph-operator. > > The following error message was seen in describe pvc: > > Warning ProvisioningFailed 85m (x14 over 104m) > openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-65477c4f5- > 9x4s7_286be8d2-73a0-4410-b8ed-d1ce7d4e187e failed to provision volume with > StorageClass "ocs-storagecluster-ceph-rbd": rpc error: code = > InvalidArgument desc = failed to fetch monitor list using clusterID > (openshift-storage): missing configuration for cluster ID "openshift-storage" > > > Here are the version of various components in my consumer cluster: > > ocos get csv > NAME DISPLAY > VERSION REPLACES PHASE > mcg-operator.v4.10.9 NooBaa Operator > 4.10.9 mcg-operator.v4.10.8 Succeeded > observability-operator.v0.0.20 Observability Operator > 0.0.20 observability-operator.v0.0.19 Succeeded > ocs-operator.v4.10.9 OpenShift Container Storage > 4.10.9 ocs-operator.v4.10.8 Succeeded > ocs-osd-deployer.v2.0.11 OCS OSD Deployer > 2.0.11-7 ocs-osd-deployer.v2.0.10 Succeeded > odf-csi-addons-operator.v4.10.9 CSI Addons > 4.10.9 odf-csi-addons-operator.v4.10.8 Succeeded > odf-operator.v4.10.9 OpenShift Data Foundation > 4.10.9 odf-operator.v4.10.8 Succeeded > ose-prometheus-operator.4.10.0 Prometheus Operator > 4.10.0 ose-prometheus-operator.4.8.0 Succeeded > route-monitor-operator.v0.1.461-dbddf1f Route Monitor Operator > 0.1.461-dbddf1f route-monitor-operator.v0.1.456-02ea942 Succeeded > > oc version > Client Version: 4.12.0 > Kustomize Version: v4.5.7 > Server Version: 4.10.45 > Kubernetes Version: v1.23.12+8a6bfe4 This was seen with 4TiB cluster, FYI. My setup consisted of 1 Provider and 3 consumers and it was seen in just one consumer and other 2 worked fine out of the box. Yash can you please ACK if the SRE workaround is acceptable from your perspective? Closing this as won't Fix as we have a workaround for it, that is to restart the rook-ceph-operator. |
Description of problem: RBD PVCs can not be created on fresh clusters with 8 or 10 size option specified. They end up with an error: InvalidArgument desc = failed to fetch monitor list using clusterID (openshift-storage): missing configuration for cluster ID "openshift-storage" E.g.: Name: pvc-test-a83f3365ad834bf996cd793c7a39b20 Namespace: namespace-test-2984fe87e8ec4534907c7ba73 StorageClass: ocs-storagecluster-ceph-rbd Status: Pending Volume: Labels: <none> Annotations: volume.beta.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com volume.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com Finalizers: [kubernetes.io/pvc-protection] Capacity: Access Modes: VolumeMode: Filesystem Used By: <none> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Provisioning 31s (x7 over 62s) openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-65477c4f5-9kk8w_77954ce8-d024-42cc-9b93-1f58d176f537 External provisioner is provisioning volume for claim "namespace-test-2984fe87e8ec4534907c7ba73/pvc-test-a83f3365ad834bf996cd793c7a39b20" Warning ProvisioningFailed 31s (x7 over 62s) openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-65477c4f5-9kk8w_77954ce8-d024-42cc-9b93-1f58d176f537 failed to provision volume with StorageClass "ocs-storagecluster-ceph-rbd": rpc error: code = InvalidArgument desc = failed to fetch monitor list using clusterID (openshift-storage): missing configuration for cluster ID "openshift-storage" Normal ExternalProvisioning 8s (x5 over 62s) persistentvolume-controller waiting for a volume to be created, either by external provisioner "openshift-storage.rbd.csi.ceph.com" or manually created by system administrator Version-Release number of selected component (if applicable): ocs-operator.v4.10.8 How reproducible: 4/4 Steps to Reproduce: 1. Deploy a provider cluster with size 8 or 20 and a consumer. 2. Create a PVC on consumer that that uses RBD storageclass. Actual results: PVC is stuck in Pending state with an error: InvalidArgument desc = failed to fetch monitor list using clusterID (openshift-storage): missing configuration for cluster ID "openshift-storage" Expected results: PVC is created. Additional info: There is a workaround: restart rook-ceph-operator Discussion about the issue: https://chat.google.com/room/AAAASHA9vWs/H_U9EtJfcPQ