Bug 2106064 - PV stays in released state while using CSI drivers from Vsphere.
Summary: PV stays in released state while using CSI drivers from Vsphere.
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.9
Hardware: All
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Jan Safranek
QA Contact: Wei Duan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-11 16:29 UTC by Palash Khaire
Modified: 2022-07-14 11:36 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-07-14 11:36:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Palash Khaire 2022-07-11 16:29:21 UTC
Description of problem: PV stays Released state and does not delete automatically and all of these are created using VolumeClaimTemplate in a pipeline run.

Actual results: Manual PVC creation and delete seem to be working. It seems like the PVC is deleted. I can delete the PV manually without any error.

CSI driver v2.4.1 from Vsphere

Adding other details in bug private comments.

Comment 2 Jan Safranek 2022-07-13 12:18:35 UTC
> We use CSI driver v2.4.1 from Vsphere

Unfortunately, we do not support this CSI driver in 4.9. We build and ship our own version of the CSI driver and it's available as Tech Preview in 4.9 and it's behind TechPreviewNoUpgrade Feature Set. I am lowering the severity accordingly.

While we could close this bug and defer the customer to vSphere, who ships the driver they're using, I think the same issue could be reproducible with our version too. From the symptoms it looks like the CSI driver is not idempotent and reports deletion of already deleted volume as an error instead of success. Based on the discussion in the support ticket, I have a theory that it could be caused by certain load - when many PVs are about to be deleted, some deletions time out in the driver, but eventually succeed in vCenter. When the driver re-tries to delete the volume, it already does not exist and the driver /should/ return success, but it returns an error.

I will check if this theory is reproducible with the CSI driver that we ship. I can't fix the driver that the customer uses though.

Comment 3 Jan Safranek 2022-07-14 11:36:58 UTC
I can't reproduce it with the vSphere CSI driver that is shipped in OCP 4.9, 4.10 and 4.11

- I tried to delete a volume in vCenter before deleting a PVC, the driver always recognized the volume is already deleted and deleted the PV (sometimes after few retries).
- I tried to delete 100 PVCs at the same time, where the driver timed out deleting most of them. The next retry always recognized that the volumes are already deleted and deleted the PVs. Again, few retries were necessary and I could see the error message mentioned above, but the CSI driver recovered eventually.

I am closing this issue, as our CSI driver seems to be ok. Feel free to reopen if the customer is able to reproduce it with the CSI driver that we ship and provide the driver logs (= must-gather).


Note You need to log in before you can comment on or make changes to this bug.