Description of problem (please be detailed as possible and provide log snippests): RBD PVC creation fails on VMware Version of all relevant components (if applicable): OCS operator:- ocs-operator.v4.9.0-112.ci OCP version:- 4.9.0-0.nightly-2021-08-18-144658 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue be reproducible? yes Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Deploy OCS 4.9 cluster over vmware 2. Create rbd pvc 3. Actual results: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning ProvisioningFailed 66m openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-6d68d86c5c-sj8sm_9103b034-edbd-4796-ab1c-115bb0cf713c failed to provision volume with StorageClass "ocs-storagecluster-ceph-rbd": rpc error: code = DeadlineExceeded desc = context deadline exceeded Warning ProvisioningFailed 42m (x14 over 66m) openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-6d68d86c5c-sj8sm_9103b034-edbd-4796-ab1c-115bb0cf713c failed to provision volume with StorageClass "ocs-storagecluster-ceph-rbd": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-3fb011ec-4e83-402f-a2d6-184e6d31a083 already exists Normal ExternalProvisioning 3m51s (x270 over 68m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "openshift-storage.rbd.csi.ceph.com" or manually created by system administrator Normal Provisioning 2m53s (x26 over 68m) openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-6d68d86c5c-sj8sm_9103b034-edbd-4796-ab1c-115bb0cf713c External provisioner is provisioning volume for claim "openshift-storage/test2" Expected results: pvc should be in bound state Additional info:
Niels, can this be another instance of https://bugzilla.redhat.com/show_bug.cgi?id=1986794 We issue a pvc creation request but it never comes back: >> 2021-08-25T16:10:26.610860120Z I0825 16:10:26.610816 1 utils.go:176] ID: 22 Req-ID: pvc-c0d0f028-61a0-425e-9e78-0d1baa30cabe GRPC call: /csi.v1.Controller/CreateVolume >> 2021-08-25T16:10:26.611284529Z I0825 16:10:26.611261 1 utils.go:180] ID: 22 Req-ID: pvc-c0d0f028-61a0-425e-9e78-0d1baa30cabe GRPC request: {"capacity_range":{"required_bytes":42949672960},"name":"pvc-c0d0f028-61a0-425e-9e78-0d1baa30cabe","parameters":{"clusterID":"openshift-storage","csi.storage.k8s.io/pv/name":"pvc-c0d0f028-61a0-425e-9e78-0d1baa30cabe","csi.storage.k8s.io/pvc/name":"my-prometheus-claim-prometheus-k8s-0","csi.storage.k8s.io/pvc/namespace":"openshift-monitoring","imageFeatures":"layering","imageFormat":"2","pool":"ocs-storagecluster-cephblockpool","thickProvision":"false"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]} >> 2021-08-25T16:10:26.611514312Z I0825 16:10:26.611501 1 rbd_util.go:1202] ID: 22 Req-ID: pvc-c0d0f028-61a0-425e-9e78-0d1baa30cabe setting disableInUseChecks: false image features: [layering] mounter: rbd 2021-08-25T16:10:26.626951213Z E0825 16:10:26.626906 1 omap.go:77] ID: 22 Req-ID: pvc-c0d0f028-61a0-425e-9e78-0d1baa30cabe omap not found (pool="ocs-storagecluster-cephblockpool", namespace="", name="csi.volumes.default"): rados: ret=-2, No such file or directory >> 2021-08-25T16:10:26.636517322Z I0825 16:10:26.636468 1 omap.go:154] ID: 22 Req-ID: pvc-c0d0f028-61a0-425e-9e78-0d1baa30cabe set omap keys (pool="ocs-storagecluster-cephblockpool", namespace="", name="csi.volumes.default"): map[csi.volume.pvc-c0d0f028-61a0-425e-9e78-0d1baa30cabe:f5bb7432-05be-11ec-b0e7-0a580a810260]) >> 2021-08-25T16:10:26.641030165Z I0825 16:10:26.640965 1 omap.go:154] ID: 22 Req-ID: pvc-c0d0f028-61a0-425e-9e78-0d1baa30cabe set omap keys (pool="ocs-storagecluster-cephblockpool", namespace="", name="csi.volume.f5bb7432-05be-11ec-b0e7-0a580a810260"): map[csi.imagename:csi-vol-f5bb7432-05be-11ec-b0e7-0a580a810260 csi.volname:pvc-c0d0f028-61a0-425e-9e78-0d1baa30cabe csi.volume.owner:openshift-monitoring]) >> 2021-08-25T16:10:26.641106844Z I0825 16:10:26.641096 1 rbd_journal.go:484] ID: 22 Req-ID: pvc-c0d0f028-61a0-425e-9e78-0d1baa30cabe generated Volume ID (0001-0011-openshift-storage-0000000000000002-f5bb7432-05be-11ec-b0e7-0a580a810260) and image name (csi-vol-f5bb7432-05be-11ec-b0e7-0a580a810260) for request name (pvc-c0d0f028-61a0-425e-9e78-0d1baa30cabe) >> 2021-08-25T16:10:26.641242352Z I0825 16:10:26.641208 1 rbd_util.go:242] ID: 22 Req-ID: pvc-c0d0f028-61a0-425e-9e78-0d1baa30cabe rbd: create ocs-storagecluster-cephblockpool/csi-vol-f5bb7432-05be-11ec-b0e7-0a580a810260 size 40960M (features: [layering]) using mon 172.30.120.106:6789,172.30.246.236:6789,172.30.193.17:6789
(In reply to Mudit Agarwal from comment #4) > Niels, can this be another instance of > https://bugzilla.redhat.com/show_bug.cgi?id=1986794 Yes, quite possible. Bug 1986794 shows hangs while creating an RBD image (before thick-provisioning started), it hangs at the same location here.
We'll continue working in bz 1986794 for now. Once that is resolved, the problem reported in this bug might be fixed as well.
https://bugzilla.redhat.com/show_bug.cgi?id=2000434 is ON_QA which means we have a fix in Ceph. This can be moved to ON_QA once we have an OCS build with the Ceph fix.
created a test SC with thick provisioner and a pool created a PVC under this SC and it was is in bound state default rbd-test-pvc Bound pvc-046869ae-96f0-4f83-8ba2-158ab88b1322 1Gi RWO test-sc 52s moving to verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Data Foundation 4.9.0 enhancement, security, and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:5086