Bug 1997738

Summary: RBD pvc creation fails on VMware
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Pratik Surve <prsurve>
Component: cephAssignee: Scott Ostapovicz <sostapov>
Status: CLOSED ERRATA QA Contact: Anna Sandler <asandler>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.9CC: asandler, bniver, ikave, kramdoss, madam, muagarwa, ndevos, ocs-bugs, odf-bz-bot
Target Milestone: ---Keywords: AutomationBackLog, TestBlocker, Triaged
Target Release: ODF 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: v4.9.0-164.ci Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-13 17:45:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1986794, 2000434    
Bug Blocks:    

Description Pratik Surve 2021-08-25 17:49:58 UTC
Description of problem (please be detailed as possible and provide log
snippests):

RBD PVC creation fails on VMware 

Version of all relevant components (if applicable):

OCS operator:- ocs-operator.v4.9.0-112.ci
OCP version:- 4.9.0-0.nightly-2021-08-18-144658

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue be reproducible?
yes

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Deploy OCS 4.9 cluster over vmware
2. Create rbd pvc
3.


Actual results:
Events:
  Type     Reason                Age                    From                                                                                                                Message
  ----     ------                ----                   ----                                                                                                                -------
  Warning  ProvisioningFailed    66m                    openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-6d68d86c5c-sj8sm_9103b034-edbd-4796-ab1c-115bb0cf713c  failed to provision volume with StorageClass "ocs-storagecluster-ceph-rbd": rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Warning  ProvisioningFailed    42m (x14 over 66m)     openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-6d68d86c5c-sj8sm_9103b034-edbd-4796-ab1c-115bb0cf713c  failed to provision volume with StorageClass "ocs-storagecluster-ceph-rbd": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-3fb011ec-4e83-402f-a2d6-184e6d31a083 already exists
  Normal   ExternalProvisioning  3m51s (x270 over 68m)  persistentvolume-controller                                                                                         waiting for a volume to be created, either by external provisioner "openshift-storage.rbd.csi.ceph.com" or manually created by system administrator
  Normal   Provisioning          2m53s (x26 over 68m)   openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-6d68d86c5c-sj8sm_9103b034-edbd-4796-ab1c-115bb0cf713c  External provisioner is provisioning volume for claim "openshift-storage/test2"

Expected results:
pvc should be in bound state

Additional info:

Comment 4 Mudit Agarwal 2021-08-26 01:29:00 UTC
Niels, can this be another instance of https://bugzilla.redhat.com/show_bug.cgi?id=1986794

We issue a pvc creation request but it never comes back:

>> 2021-08-25T16:10:26.610860120Z I0825 16:10:26.610816       1 utils.go:176] ID: 22 Req-ID: pvc-c0d0f028-61a0-425e-9e78-0d1baa30cabe GRPC call: /csi.v1.Controller/CreateVolume
>> 2021-08-25T16:10:26.611284529Z I0825 16:10:26.611261       1 utils.go:180] ID: 22 Req-ID: pvc-c0d0f028-61a0-425e-9e78-0d1baa30cabe GRPC request: {"capacity_range":{"required_bytes":42949672960},"name":"pvc-c0d0f028-61a0-425e-9e78-0d1baa30cabe","parameters":{"clusterID":"openshift-storage","csi.storage.k8s.io/pv/name":"pvc-c0d0f028-61a0-425e-9e78-0d1baa30cabe","csi.storage.k8s.io/pvc/name":"my-prometheus-claim-prometheus-k8s-0","csi.storage.k8s.io/pvc/namespace":"openshift-monitoring","imageFeatures":"layering","imageFormat":"2","pool":"ocs-storagecluster-cephblockpool","thickProvision":"false"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]}
>> 2021-08-25T16:10:26.611514312Z I0825 16:10:26.611501       1 rbd_util.go:1202] ID: 22 Req-ID: pvc-c0d0f028-61a0-425e-9e78-0d1baa30cabe setting disableInUseChecks: false image features: [layering] mounter: rbd
2021-08-25T16:10:26.626951213Z E0825 16:10:26.626906       1 omap.go:77] ID: 22 Req-ID: pvc-c0d0f028-61a0-425e-9e78-0d1baa30cabe omap not found (pool="ocs-storagecluster-cephblockpool", namespace="", name="csi.volumes.default"): rados: ret=-2, No such file or directory
>> 2021-08-25T16:10:26.636517322Z I0825 16:10:26.636468       1 omap.go:154] ID: 22 Req-ID: pvc-c0d0f028-61a0-425e-9e78-0d1baa30cabe set omap keys (pool="ocs-storagecluster-cephblockpool", namespace="", name="csi.volumes.default"): map[csi.volume.pvc-c0d0f028-61a0-425e-9e78-0d1baa30cabe:f5bb7432-05be-11ec-b0e7-0a580a810260])
>> 2021-08-25T16:10:26.641030165Z I0825 16:10:26.640965       1 omap.go:154] ID: 22 Req-ID: pvc-c0d0f028-61a0-425e-9e78-0d1baa30cabe set omap keys (pool="ocs-storagecluster-cephblockpool", namespace="", name="csi.volume.f5bb7432-05be-11ec-b0e7-0a580a810260"): map[csi.imagename:csi-vol-f5bb7432-05be-11ec-b0e7-0a580a810260 csi.volname:pvc-c0d0f028-61a0-425e-9e78-0d1baa30cabe csi.volume.owner:openshift-monitoring])
>> 2021-08-25T16:10:26.641106844Z I0825 16:10:26.641096       1 rbd_journal.go:484] ID: 22 Req-ID: pvc-c0d0f028-61a0-425e-9e78-0d1baa30cabe generated Volume ID (0001-0011-openshift-storage-0000000000000002-f5bb7432-05be-11ec-b0e7-0a580a810260) and image name (csi-vol-f5bb7432-05be-11ec-b0e7-0a580a810260) for request name (pvc-c0d0f028-61a0-425e-9e78-0d1baa30cabe)
>> 2021-08-25T16:10:26.641242352Z I0825 16:10:26.641208       1 rbd_util.go:242] ID: 22 Req-ID: pvc-c0d0f028-61a0-425e-9e78-0d1baa30cabe rbd: create ocs-storagecluster-cephblockpool/csi-vol-f5bb7432-05be-11ec-b0e7-0a580a810260 size 40960M (features: [layering]) using mon 172.30.120.106:6789,172.30.246.236:6789,172.30.193.17:6789

Comment 5 Niels de Vos 2021-09-01 06:24:43 UTC
(In reply to Mudit Agarwal from comment #4)
> Niels, can this be another instance of
> https://bugzilla.redhat.com/show_bug.cgi?id=1986794

Yes, quite possible. Bug 1986794 shows hangs while creating an RBD image (before thick-provisioning started), it hangs at the same location here.

Comment 6 Niels de Vos 2021-09-01 06:26:28 UTC
We'll continue working in bz 1986794 for now. Once that is resolved, the problem reported in this bug might be fixed as well.

Comment 7 Mudit Agarwal 2021-09-20 07:52:08 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=2000434 is ON_QA which means we have a fix in Ceph.

This can be moved to ON_QA once we have an OCS build with the Ceph fix.

Comment 14 Anna Sandler 2021-10-13 22:51:17 UTC
created a test SC with thick provisioner and a pool
created a PVC under this SC and it was is in bound state 

default                    rbd-test-pvc                                Bound    pvc-046869ae-96f0-4f83-8ba2-158ab88b1322   1Gi        RWO            test-sc                       52s

moving to verified

Comment 16 errata-xmlrpc 2021-12-13 17:45:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Data Foundation 4.9.0 enhancement, security, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:5086