Description of problem: Volume is not deleted after destroy cluster Version-Release number of selected component (if applicable): 4.10.0-0.nightly-2022-01-27-144113 How reproducible: Always Steps to Reproduce: 1.Install OCP on IBM platform. 2.Create pvc/pod with Retain or Delete Policy oc get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-a084a0fa-e705-4a3f-8ecf-caea1a485f59 10Gi RWO Delete Bound test3/myclaim5 ibmc-vpc-block-10iops-tier 15s pvc-a5c664c1-0eda-4507-93af-142d8cbf94db 11Gi RWO Retain Bound test1/myclaim5 sc-zone2 3h pvc-fb399c15-fa5c-45f4-a2bf-85641cdeb25d 10Gi RWO Retain Bound test2/myclaim5 sc-zone2 23m 3.Destroy Cluster Actual results: Volumes are not deleted from backend https://cloud.ibm.com/vpc-ext/storage/storageVolumes Expected results: Volumes should deleted. Master Log: Node Log (of failed PODs): PV Dump: PVC Dump: StorageClass Dump (if StorageClass used by PV/PVC): Additional info:
The driver should tag the volumes it creates in a way that `openshift destroy cluster` can find it and delete it. For example, on AWS we use tag "kubernetes.io/cluster/<cluster-id>: owned" to tag all volumes and installer deletes those. AWS CSI driver gets it as a cmdline parameter + our AWS EBS CSI driver operator provides the parameter.
This is not supported as of now, we do support this on managed cluster. Can we discuss on this as this require changes to support and for that we need to understand the `openshift destroy cluster` process.
We need to add the support in the installer and csi driver similar to https://github.com/openshift/installer/blob/master/pkg/destroy/gcp/disk.go so there will be three PR 1- one to add file `destroy/ibm/disk.go` 2- in https://github.com/kubernetes-sigs/ibm-vpc-block-csi-driver to read `--extra-labels` add the tag while creating volume from vpc 3- update the csi driver deployment file to get the tag detail i.e https://github.com/openshift/gcp-pd-csi-driver-operator/blob/223a251c3ba39d8af605258d14794b32a5cfafda/assets/controller.yaml#L51 This will take time, we will raise PR by end of next week. Can you please provide the env to test it.
some additional openshift installer code references for you (related to the issue) - implement a similar go module as gcp under installer/pkg/destroy/ibmcloud/disk.go (https://github.com/openshift/installer/blob/master/pkg/destroy/gcp/disk.go) - implement a destroyDisks function (https://github.com/openshift/installer/blob/3f318d7049d5f4b6f98211b4b899fdb43b1f3542/pkg/destroy/gcp/disk.go#L86) - update the ibmcloud uninstaller to invoke that function at the appropriate point in the tear down process (https://github.com/openshift/installer/blob/3f318d7049d5f4b6f98211b4b899fdb43b1f3542/pkg/destroy/ibmcloud/ibmcloud.go#L94)
I was trying to code it but checked vpc support for volume tagging and found that its not supported hence opened JIRA for them https://jiracloud.swg.usma.ibm.com:8443/browse/API-3298
This require support from IBM VPC side, I opened the JIRA once that is fix then we will add the support in the csi driver, this will take time tentative date can be 3/15
The doc update for this is being tracked in https://github.com/openshift/openshift-docs/pull/42293. Specifically, the topic that details how to uninstall a cluster.
tentative date would be mid April
due to other priorities we could not make this so next tentative date would be 31st May, also we have dependency to create cluster by using installer and we are not able to create which will impact this delivery as well. cluster creation issue was discussed here https://coreos.slack.com/archives/C01U40AM37F/p1652074667322439 and opened bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=2083006
sameer.shaikh is working on this issue
Jan Safranek do we know if gcp and aws today skip the volumes with reclaim policy (Retain). I don't see any filtering out of such volumes ?
As discussed with jsafrane we don't need to worry about the Volumes which are having reclaim policy "Retain". All volumes can be deleted Slack discussion -- https://coreos.slack.com/archives/C01U40AM37F/p1654776475215769 As discussed with cschaefe we will follow the existing flow of not exiting the installer in case volume deletion is stuck but throw the error which will suggest how to cleanup the volumes manually. For example -- "Failed to delete disk name=myvolume-update-sam, id=r026-bfdfbb30-69ec-4afc-9d10-151eef9dfce8.If this error continues to persist for more than 20 minutes then please try to manually cleanup the volume using - ibmcloud is vold r026-bfdfbb30-69ec-4afc-9d10-151eef9dfce8" Slack discussion -- https://coreos.slack.com/archives/C01U40AM37F/p1654836366652279?thread_ts=1654767524.676699&cid=C01U40AM37F
Volume could be deleted after destroy cluster. build openshift/ibm-vpc-block-csi-driver#14,openshift/ibm-vpc-node-label-updater#11,openshift/ibm-vpc-block-csi-driver-operator#40
All PRS are merged.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069