2047732 – [IBM]Volume is not deleted after destroy cluster

Bug 2047732 - [IBM]Volume is not deleted after destroy cluster

Summary: [IBM]Volume is not deleted after destroy cluster

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	4.11.0
Assignee:	Sameer Shaikh
QA Contact:	Chao Yang
Docs Contact:	Mike Pytlak
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-01-28 12:15 UTC by Chao Yang
Modified:	2022-08-23 19:39 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Removed functionality
Doc Text:	The fix for this issue will be in 4.11. For 4.10, a doc update to the following section will be needed. https://deploy-preview-39767--osdocs.netlify.app/openshift-enterprise/latest/installing/installing_ibm_cloud_public/uninstalling-cluster-ibm-cloud.html Doc update text will be provided.
Clone Of:
Environment:
Last Closed:	2022-08-23 19:39:39 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift ibm-vpc-block-csi-driver-operator pull 40	None	Merged	Bug 2047732: [IBM]Volume is not deleted after destroy cluster	2022-06-29 14:47:47 UTC
Github	openshift installer pull 5962	None	Merged	Bug 2047732: [IBM]Volume is not deleted after destroy cluster	2022-06-29 14:47:50 UTC
Red Hat Product Errata	RHSA-2022:5069	None	None	None	2022-08-23 19:39:57 UTC

Description Chao Yang 2022-01-28 12:15:06 UTC

Description of problem:
Volume is not deleted after destroy cluster

Version-Release number of selected component (if applicable):
4.10.0-0.nightly-2022-01-27-144113

How reproducible:
Always

Steps to Reproduce:
1.Install OCP on IBM platform.
2.Create pvc/pod with Retain or Delete Policy
 oc get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM            STORAGECLASS                 REASON   AGE
pvc-a084a0fa-e705-4a3f-8ecf-caea1a485f59   10Gi       RWO            Delete           Bound    test3/myclaim5   ibmc-vpc-block-10iops-tier            15s
pvc-a5c664c1-0eda-4507-93af-142d8cbf94db   11Gi       RWO            Retain           Bound    test1/myclaim5   sc-zone2                              3h
pvc-fb399c15-fa5c-45f4-a2bf-85641cdeb25d   10Gi       RWO            Retain           Bound    test2/myclaim5   sc-zone2                              23m

3.Destroy Cluster

Actual results:
Volumes are not deleted from backend 
https://cloud.ibm.com/vpc-ext/storage/storageVolumes

Expected results:
Volumes should deleted.

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 1 Jan Safranek 2022-01-28 15:32:18 UTC

The driver should tag the volumes it creates in a way that `openshift destroy cluster` can find it and delete it. For example, on AWS we use tag "kubernetes.io/cluster/<cluster-id>: owned" to tag all volumes and installer deletes those. AWS CSI driver gets it as a cmdline parameter + our AWS EBS CSI driver operator provides the parameter.

Comment 2 Arashad Ahamad 2022-01-31 06:40:39 UTC

This is not supported as of now, we do support this on managed cluster.

Can we discuss on this as this require changes to support and for that we need to understand the `openshift destroy cluster` process.

Comment 3 Arashad Ahamad 2022-02-07 13:02:04 UTC

We need to add the support in the installer and csi driver similar to https://github.com/openshift/installer/blob/master/pkg/destroy/gcp/disk.go 


so there will be three PR
1- one to add file `destroy/ibm/disk.go`
2- in https://github.com/kubernetes-sigs/ibm-vpc-block-csi-driver to read `--extra-labels` add the tag while creating volume from vpc
3- update the csi driver deployment file to get the tag detail i.e https://github.com/openshift/gcp-pd-csi-driver-operator/blob/223a251c3ba39d8af605258d14794b32a5cfafda/assets/controller.yaml#L51 


This will take time, we will raise PR by end of next week. 


Can you please provide the env to test it.

Comment 4 Jeff Nowicki 2022-02-07 17:23:09 UTC

some additional openshift installer code references for you (related to the issue)
- implement a similar go module as gcp under installer/pkg/destroy/ibmcloud/disk.go (https://github.com/openshift/installer/blob/master/pkg/destroy/gcp/disk.go)
- implement a destroyDisks function (https://github.com/openshift/installer/blob/3f318d7049d5f4b6f98211b4b899fdb43b1f3542/pkg/destroy/gcp/disk.go#L86)
- update the ibmcloud uninstaller to invoke that function at the appropriate point in the tear down process (https://github.com/openshift/installer/blob/3f318d7049d5f4b6f98211b4b899fdb43b1f3542/pkg/destroy/ibmcloud/ibmcloud.go#L94)

Comment 5 Arashad Ahamad 2022-02-14 11:56:28 UTC

I was trying to code it but checked vpc support for volume tagging and found that its not supported hence opened JIRA for them https://jiracloud.swg.usma.ibm.com:8443/browse/API-3298

Comment 6 Arashad Ahamad 2022-02-18 13:47:11 UTC

This require support from IBM VPC side, I opened the JIRA once that is fix then we will add the support in the csi driver, this will take time tentative date can be 3/15

Comment 7 Mike Pytlak 2022-02-23 20:12:26 UTC

The doc update for this is being tracked in https://github.com/openshift/openshift-docs/pull/42293. Specifically, the topic that details how to uninstall a cluster.

Comment 8 Arashad Ahamad 2022-03-25 10:33:28 UTC

tentative date would be mid April

Comment 10 Arashad Ahamad 2022-05-12 08:25:04 UTC

due to other priorities we could not make this so next tentative date would be 31st May, also we have dependency to create cluster by using installer and we are not able to create which will impact this delivery  as well.

cluster creation issue was discussed here https://coreos.slack.com/archives/C01U40AM37F/p1652074667322439 and opened bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=2083006

Comment 11 Arashad Ahamad 2022-06-03 09:51:16 UTC

sameer.shaikh is working on this issue

Comment 12 Sameer Shaikh 2022-06-09 12:03:35 UTC

Jan Safranek do we know if gcp and aws today skip the volumes with reclaim policy (Retain). I don't see any filtering out of such volumes ?

Comment 13 Sameer Shaikh 2022-06-10 06:43:27 UTC

As discussed with jsafrane we don't need to worry about the Volumes which are having reclaim policy "Retain". All volumes can be deleted

Slack discussion -- https://coreos.slack.com/archives/C01U40AM37F/p1654776475215769

As discussed with cschaefe we will follow the existing flow of not exiting the installer in case volume deletion is stuck but throw the error which will suggest how to cleanup the volumes manually.

For example -- 

"Failed to delete disk name=myvolume-update-sam, id=r026-bfdfbb30-69ec-4afc-9d10-151eef9dfce8.If this error continues to persist for more than 20 minutes then please try to manually cleanup the volume using - ibmcloud is vold r026-bfdfbb30-69ec-4afc-9d10-151eef9dfce8"

Slack discussion -- https://coreos.slack.com/archives/C01U40AM37F/p1654836366652279?thread_ts=1654767524.676699&cid=C01U40AM37F

Comment 15 Chao Yang 2022-06-28 12:47:55 UTC

Volume could be deleted after destroy cluster.
build openshift/ibm-vpc-block-csi-driver#14,openshift/ibm-vpc-node-label-updater#11,openshift/ibm-vpc-block-csi-driver-operator#40

Comment 16 Sameer Shaikh 2022-06-29 14:41:45 UTC

All PRS are merged.

Comment 19 errata-xmlrpc 2022-08-23 19:39:39 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Note You need to log in before you can comment on or make changes to this bug.