Niels, the PV is failed to delete - does this cause issue with future requests for PVC? If not, what could be cause of errors seen with PVC being stuck in Pending state? Name: pvc-44abdab4-37ca-4262-b76f-838bef469383 Labels: <none> Annotations: pv.kubernetes.io/provisioned-by: openshift-storage.rbd.csi.ceph.com Finalizers: [kubernetes.io/pv-protection] StorageClass: ocs-storagecluster-ceph-rbd Status: Released Claim: elastic-system/elasticsearch-data-quickstart-es-default-0 Reclaim Policy: Delete Access Modes: RWO VolumeMode: Filesystem Capacity: 10Gi Node Affinity: <none> Message: Source: Type: CSI (a Container Storage Interface (CSI) volume source) Driver: openshift-storage.rbd.csi.ceph.com FSType: ext4 VolumeHandle: 0001-0011-openshift-storage-0000000000000001-013e23cf-23a2-11ec-8d6a-0a580a80021f ReadOnly: false VolumeAttributes: clusterID=openshift-storage csi.storage.k8s.io/pv/name=pvc-44abdab4-37ca-4262-b76f-838bef469383 csi.storage.k8s.io/pvc/name=elasticsearch-data-quickstart-es-default-0 csi.storage.k8s.io/pvc/namespace=elastic-system imageFeatures=layering imageFormat=2 imageName=csi-vol-013e23cf-23a2-11ec-8d6a-0a580a80021f journalPool=ocs-storagecluster-cephblockpool pool=ocs-storagecluster-cephblockpool storage.kubernetes.io/csiProvisionerIdentity=1633187049810-8081-openshift-storage.rbd.csi.ceph.com Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning VolumeFailedDelete 6m2s openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-867dd6b7cc-q84t7_ca66a4ac-0d00-4aac-ad99-a57792fd8310 rpc error: code = DeadlineExceeded desc = context deadline exceeded Warning VolumeFailedDelete 51s (x8 over 5m54s) openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-867dd6b7cc-q84t7_ca66a4ac-0d00-4aac-ad99-a57792fd8310 rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0011-openshift-storage-0000000000000001-013e23cf-23a2-11ec-8d6a-0a580a80021f already exists
Initial creation of the PVC was pretty fast according to the logs. CreateVolume call received at 2021-10-02T16:58:42.711098247Z CreateVolume call returned at 2021-10-02T16:58:47.915157032Z So, creating the volume took only 5.2 seconds for Ceph-CSI. Deleting the volume (PV pvc-44abdab4-37ca-4262-b76f-838bef469383) seems to have gotten stuck somehow. The DeleteVolume request came in at 2021-10-02T19:47:54.472122421Z, but there has been no real progress until the end of the logs (2021-10-02T19:55:35.110842025Z). This has been 7+ minutes, by which volumes usually get deleted. At 2021-10-02T19:50:32.810707032Z a 2nd try to delete came in, which detected the previous in-progress request, which then returns the "operation with given Volume ID .. already exists". It is unclear why the initial request to delete the volume got hung, and never returned. It is possible that due to resource limits the `rbd` command inside the provisioner Pod was not able to allocate sufficient resources (threads, memory) and was prevented from continuing. Further executions of the `rbd` command fail with `signal: killed` which indicate that the container platform selected the process for eliminating. Depending the exact failure of the resource allocations, we have seen the `rbd` command to hang indefinitely. Usually there is no problem when deleting a volume fails. New volumes can be created without issue. Depending on how the new volume (same namespace/name?) was created, OpenShift may wait with sending the CreateVolume request to the provisioner until the VolumeDelete call succeeded. So, I think this would be the summary: Problem: Deleting and recreating a RBD PVC fails. Cause: Volume failed to get deleted due to resource restrictions. Solution: Replace `rbd` command at https://github.com/openshift/ceph-csi/blob/ad563f5bebb2efd5f64dee472e441bbe918fa101/internal/rbd/rbd_util.go#L1203 with go-ceph API calls. By using go-ceph, less resources are needed, existing connections to the Ceph cluster are re-used, deleting the RBD volume is expected to succeed.
https://github.com/ceph/ceph-csi/issues/2558 has been created to track the work needed in upstream ceph-csi. (This depends on an unreleased version of go-ceph.) As a workaround, increasing the resource limits for the ceph-csi provisioner might be possible.
Thanks Niels for the analysis! Ohad, should we relook at the resource limits for the provisioner pods?
Sahana, This might be the solution for now. Niels, can you offer some guidance around proper values?
(In reply to Niels de Vos from comment #5) > https://github.com/ceph/ceph-csi/issues/2558 has been created to track the > work needed in upstream ceph-csi. (This depends on an unreleased version of > go-ceph.) This has been included in current versions. It might be possible to lower the resource limits again.
Added Tracking keyword and moved the bug to ON_QA as the issue is fixed in ceph-csi