Bug 2010275

Summary:	RBD PVC not provisioned in less then 60 sec.
Product:	[Red Hat Storage] Red Hat OpenShift Container Storage	Reporter:	Avi Liani <alayani>
Component:	odf-managed-service	Assignee:	Nobody <nobody>
Status:	ON_QA ---	QA Contact:	Neha Berry <nberry>
Severity:	urgent	Docs Contact:
Priority:	high
Version:	4.8	CC:	aeyal, ndevos, sgatfane
Target Milestone:	---	Keywords:	TestBlocker, Tracking
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:		Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Comment 1 Sahina Bose 2021-10-06 12:43:10 UTC

Niels, the PV is failed to delete - does this cause issue with future requests for PVC? If not, what could be cause of errors seen with PVC being stuck in Pending state?

Name:            pvc-44abdab4-37ca-4262-b76f-838bef469383
Labels:          <none>
Annotations:     pv.kubernetes.io/provisioned-by: openshift-storage.rbd.csi.ceph.com
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    ocs-storagecluster-ceph-rbd
Status:          Released
Claim:           elastic-system/elasticsearch-data-quickstart-es-default-0
Reclaim Policy:  Delete
Access Modes:    RWO
VolumeMode:      Filesystem
Capacity:        10Gi
Node Affinity:   <none>
Message:         
Source:
    Type:              CSI (a Container Storage Interface (CSI) volume source)
    Driver:            openshift-storage.rbd.csi.ceph.com
    FSType:            ext4
    VolumeHandle:      0001-0011-openshift-storage-0000000000000001-013e23cf-23a2-11ec-8d6a-0a580a80021f
    ReadOnly:          false
    VolumeAttributes:      clusterID=openshift-storage
                           csi.storage.k8s.io/pv/name=pvc-44abdab4-37ca-4262-b76f-838bef469383
                           csi.storage.k8s.io/pvc/name=elasticsearch-data-quickstart-es-default-0
                           csi.storage.k8s.io/pvc/namespace=elastic-system
                           imageFeatures=layering
                           imageFormat=2
                           imageName=csi-vol-013e23cf-23a2-11ec-8d6a-0a580a80021f
                           journalPool=ocs-storagecluster-cephblockpool
                           pool=ocs-storagecluster-cephblockpool
                           storage.kubernetes.io/csiProvisionerIdentity=1633187049810-8081-openshift-storage.rbd.csi.ceph.com
Events:
  Type     Reason              Age                  From                                                                                                                Message
  ----     ------              ----                 ----                                                                                                                -------
  Warning  VolumeFailedDelete  6m2s                 openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-867dd6b7cc-q84t7_ca66a4ac-0d00-4aac-ad99-a57792fd8310  rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Warning  VolumeFailedDelete  51s (x8 over 5m54s)  openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-867dd6b7cc-q84t7_ca66a4ac-0d00-4aac-ad99-a57792fd8310  rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0011-openshift-storage-0000000000000001-013e23cf-23a2-11ec-8d6a-0a580a80021f already exists

Comment 3 Niels de Vos 2021-10-06 13:57:54 UTC

Initial creation of the PVC was pretty fast according to the logs.

CreateVolume call received at 2021-10-02T16:58:42.711098247Z
CreateVolume call returned at 2021-10-02T16:58:47.915157032Z

So, creating the volume took only 5.2 seconds for Ceph-CSI.

Deleting the volume (PV pvc-44abdab4-37ca-4262-b76f-838bef469383) seems to have gotten stuck somehow.
The DeleteVolume request came in at 2021-10-02T19:47:54.472122421Z, but there has been no real progress until the end of the logs (2021-10-02T19:55:35.110842025Z). This has been 7+ minutes, by which volumes usually get deleted.

At 2021-10-02T19:50:32.810707032Z a 2nd try to delete came in, which detected the previous in-progress request, which then returns the "operation with given Volume ID .. already exists".

It is unclear why the initial request to delete the volume got hung, and never returned. It is possible that due to resource limits the `rbd` command inside the provisioner Pod was not able to allocate sufficient resources (threads, memory) and was prevented from continuing. Further executions of the `rbd` command fail with `signal: killed` which indicate that the container platform selected the process for eliminating. Depending the exact failure of the resource allocations, we have seen the `rbd` command to hang indefinitely.

Usually there is no problem when deleting a volume fails. New volumes can be created without issue. Depending on how the new volume (same namespace/name?) was created, OpenShift may wait with sending the CreateVolume request to the provisioner until the VolumeDelete call succeeded.

So, I think this would be the summary:

Problem: Deleting and recreating a RBD PVC fails.

Cause: Volume failed to get deleted due to resource restrictions.

Solution: Replace `rbd` command at https://github.com/openshift/ceph-csi/blob/ad563f5bebb2efd5f64dee472e441bbe918fa101/internal/rbd/rbd_util.go#L1203 with go-ceph API calls. By using go-ceph, less resources are needed, existing connections to the Ceph cluster are re-used, deleting the RBD volume is expected to succeed.

Comment 5 Niels de Vos 2021-10-06 14:26:27 UTC

https://github.com/ceph/ceph-csi/issues/2558 has been created to track the work needed in upstream ceph-csi. (This depends on an unreleased version of go-ceph.)

As a workaround, increasing the resource limits for the ceph-csi provisioner might be possible.

Comment 6 Sahina Bose 2021-10-06 16:29:51 UTC

Thanks Niels for the analysis!

Ohad, should we relook at the resource limits for the provisioner pods?

Comment 7 Ohad 2021-10-06 17:47:57 UTC

Sahana, This might be the solution for now. 
Niels, can you offer some guidance around proper values?

Comment 10 Niels de Vos 2022-09-28 15:04:34 UTC

(In reply to Niels de Vos from comment #5)
> https://github.com/ceph/ceph-csi/issues/2558 has been created to track the
> work needed in upstream ceph-csi. (This depends on an unreleased version of
> go-ceph.)

This has been included in current versions. It might be possible to lower the resource limits again.

Comment 11 Dhruv Bindra 2022-10-31 12:00:56 UTC

Added Tracking keyword and moved the bug to ON_QA as the issue is fixed in ceph-csi