Bug 2047732
| Summary: | [IBM]Volume is not deleted after destroy cluster | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Chao Yang <chaoyang> |
| Component: | Storage | Assignee: | Sameer Shaikh <sameer.shaikh> |
| Storage sub component: | Storage | QA Contact: | Chao Yang <chaoyang> |
| Status: | CLOSED ERRATA | Docs Contact: | Mike Pytlak <mpytlak> |
| Severity: | medium | ||
| Priority: | unspecified | CC: | aos-bugs, arahamad, cschaefe, jnowicki, jsafrane, sameer.shaikh, wduan |
| Version: | 4.10 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.11.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Removed functionality | |
| Doc Text: |
The fix for this issue will be in 4.11. For 4.10, a doc update to the following section will be needed.
https://deploy-preview-39767--osdocs.netlify.app/openshift-enterprise/latest/installing/installing_ibm_cloud_public/uninstalling-cluster-ibm-cloud.html
Doc update text will be provided.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-08-23 19:39:39 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Chao Yang
2022-01-28 12:15:06 UTC
The driver should tag the volumes it creates in a way that `openshift destroy cluster` can find it and delete it. For example, on AWS we use tag "kubernetes.io/cluster/<cluster-id>: owned" to tag all volumes and installer deletes those. AWS CSI driver gets it as a cmdline parameter + our AWS EBS CSI driver operator provides the parameter. This is not supported as of now, we do support this on managed cluster. Can we discuss on this as this require changes to support and for that we need to understand the `openshift destroy cluster` process. We need to add the support in the installer and csi driver similar to https://github.com/openshift/installer/blob/master/pkg/destroy/gcp/disk.go so there will be three PR 1- one to add file `destroy/ibm/disk.go` 2- in https://github.com/kubernetes-sigs/ibm-vpc-block-csi-driver to read `--extra-labels` add the tag while creating volume from vpc 3- update the csi driver deployment file to get the tag detail i.e https://github.com/openshift/gcp-pd-csi-driver-operator/blob/223a251c3ba39d8af605258d14794b32a5cfafda/assets/controller.yaml#L51 This will take time, we will raise PR by end of next week. Can you please provide the env to test it. some additional openshift installer code references for you (related to the issue) - implement a similar go module as gcp under installer/pkg/destroy/ibmcloud/disk.go (https://github.com/openshift/installer/blob/master/pkg/destroy/gcp/disk.go) - implement a destroyDisks function (https://github.com/openshift/installer/blob/3f318d7049d5f4b6f98211b4b899fdb43b1f3542/pkg/destroy/gcp/disk.go#L86) - update the ibmcloud uninstaller to invoke that function at the appropriate point in the tear down process (https://github.com/openshift/installer/blob/3f318d7049d5f4b6f98211b4b899fdb43b1f3542/pkg/destroy/ibmcloud/ibmcloud.go#L94) I was trying to code it but checked vpc support for volume tagging and found that its not supported hence opened JIRA for them https://jiracloud.swg.usma.ibm.com:8443/browse/API-3298 This require support from IBM VPC side, I opened the JIRA once that is fix then we will add the support in the csi driver, this will take time tentative date can be 3/15 The doc update for this is being tracked in https://github.com/openshift/openshift-docs/pull/42293. Specifically, the topic that details how to uninstall a cluster. tentative date would be mid April due to other priorities we could not make this so next tentative date would be 31st May, also we have dependency to create cluster by using installer and we are not able to create which will impact this delivery as well. cluster creation issue was discussed here https://coreos.slack.com/archives/C01U40AM37F/p1652074667322439 and opened bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=2083006 sameer.shaikh is working on this issue Jan Safranek do we know if gcp and aws today skip the volumes with reclaim policy (Retain). I don't see any filtering out of such volumes ? As discussed with jsafrane we don't need to worry about the Volumes which are having reclaim policy "Retain". All volumes can be deleted Slack discussion -- https://coreos.slack.com/archives/C01U40AM37F/p1654776475215769 As discussed with cschaefe we will follow the existing flow of not exiting the installer in case volume deletion is stuck but throw the error which will suggest how to cleanup the volumes manually. For example -- "Failed to delete disk name=myvolume-update-sam, id=r026-bfdfbb30-69ec-4afc-9d10-151eef9dfce8.If this error continues to persist for more than 20 minutes then please try to manually cleanup the volume using - ibmcloud is vold r026-bfdfbb30-69ec-4afc-9d10-151eef9dfce8" Slack discussion -- https://coreos.slack.com/archives/C01U40AM37F/p1654836366652279?thread_ts=1654767524.676699&cid=C01U40AM37F Volume could be deleted after destroy cluster. build openshift/ibm-vpc-block-csi-driver#14,openshift/ibm-vpc-node-label-updater#11,openshift/ibm-vpc-block-csi-driver-operator#40 All PRS are merged. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |