Bug 2047732

Summary: [IBM]Volume is not deleted after destroy cluster
Product: OpenShift Container Platform Reporter: Chao Yang <chaoyang>
Component: StorageAssignee: Sameer Shaikh <sameer.shaikh>
Storage sub component: Storage QA Contact: Chao Yang <chaoyang>
Status: CLOSED ERRATA Docs Contact: Mike Pytlak <mpytlak>
Severity: medium    
Priority: unspecified CC: aos-bugs, arahamad, cschaefe, jnowicki, jsafrane, sameer.shaikh, wduan
Version: 4.10   
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Removed functionality
Doc Text:
The fix for this issue will be in 4.11. For 4.10, a doc update to the following section will be needed. https://deploy-preview-39767--osdocs.netlify.app/openshift-enterprise/latest/installing/installing_ibm_cloud_public/uninstalling-cluster-ibm-cloud.html Doc update text will be provided.
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-23 19:39:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Chao Yang 2022-01-28 12:15:06 UTC
Description of problem:
Volume is not deleted after destroy cluster

Version-Release number of selected component (if applicable):
4.10.0-0.nightly-2022-01-27-144113

How reproducible:
Always

Steps to Reproduce:
1.Install OCP on IBM platform.
2.Create pvc/pod with Retain or Delete Policy
 oc get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM            STORAGECLASS                 REASON   AGE
pvc-a084a0fa-e705-4a3f-8ecf-caea1a485f59   10Gi       RWO            Delete           Bound    test3/myclaim5   ibmc-vpc-block-10iops-tier            15s
pvc-a5c664c1-0eda-4507-93af-142d8cbf94db   11Gi       RWO            Retain           Bound    test1/myclaim5   sc-zone2                              3h
pvc-fb399c15-fa5c-45f4-a2bf-85641cdeb25d   10Gi       RWO            Retain           Bound    test2/myclaim5   sc-zone2                              23m

3.Destroy Cluster

Actual results:
Volumes are not deleted from backend 
https://cloud.ibm.com/vpc-ext/storage/storageVolumes

Expected results:
Volumes should deleted.

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 1 Jan Safranek 2022-01-28 15:32:18 UTC
The driver should tag the volumes it creates in a way that `openshift destroy cluster` can find it and delete it. For example, on AWS we use tag "kubernetes.io/cluster/<cluster-id>: owned" to tag all volumes and installer deletes those. AWS CSI driver gets it as a cmdline parameter + our AWS EBS CSI driver operator provides the parameter.

Comment 2 Arashad Ahamad 2022-01-31 06:40:39 UTC
This is not supported as of now, we do support this on managed cluster.

Can we discuss on this as this require changes to support and for that we need to understand the `openshift destroy cluster` process.

Comment 3 Arashad Ahamad 2022-02-07 13:02:04 UTC
We need to add the support in the installer and csi driver similar to https://github.com/openshift/installer/blob/master/pkg/destroy/gcp/disk.go 


so there will be three PR
1- one to add file `destroy/ibm/disk.go`
2- in https://github.com/kubernetes-sigs/ibm-vpc-block-csi-driver to read `--extra-labels` add the tag while creating volume from vpc
3- update the csi driver deployment file to get the tag detail i.e https://github.com/openshift/gcp-pd-csi-driver-operator/blob/223a251c3ba39d8af605258d14794b32a5cfafda/assets/controller.yaml#L51 


This will take time, we will raise PR by end of next week. 


Can you please provide the env to test it.

Comment 4 Jeff Nowicki 2022-02-07 17:23:09 UTC
some additional openshift installer code references for you (related to the issue)
- implement a similar go module as gcp under installer/pkg/destroy/ibmcloud/disk.go (https://github.com/openshift/installer/blob/master/pkg/destroy/gcp/disk.go)
- implement a destroyDisks function (https://github.com/openshift/installer/blob/3f318d7049d5f4b6f98211b4b899fdb43b1f3542/pkg/destroy/gcp/disk.go#L86)
- update the ibmcloud uninstaller to invoke that function at the appropriate point in the tear down process (https://github.com/openshift/installer/blob/3f318d7049d5f4b6f98211b4b899fdb43b1f3542/pkg/destroy/ibmcloud/ibmcloud.go#L94)

Comment 5 Arashad Ahamad 2022-02-14 11:56:28 UTC
I was trying to code it but checked vpc support for volume tagging and found that its not supported hence opened JIRA for them https://jiracloud.swg.usma.ibm.com:8443/browse/API-3298

Comment 6 Arashad Ahamad 2022-02-18 13:47:11 UTC
This require support from IBM VPC side, I opened the JIRA once that is fix then we will add the support in the csi driver, this will take time tentative date can be 3/15

Comment 7 Mike Pytlak 2022-02-23 20:12:26 UTC
The doc update for this is being tracked in https://github.com/openshift/openshift-docs/pull/42293. Specifically, the topic that details how to uninstall a cluster.

Comment 8 Arashad Ahamad 2022-03-25 10:33:28 UTC
tentative date would be mid April

Comment 10 Arashad Ahamad 2022-05-12 08:25:04 UTC
due to other priorities we could not make this so next tentative date would be 31st May, also we have dependency to create cluster by using installer and we are not able to create which will impact this delivery  as well.

cluster creation issue was discussed here https://coreos.slack.com/archives/C01U40AM37F/p1652074667322439 and opened bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=2083006

Comment 11 Arashad Ahamad 2022-06-03 09:51:16 UTC
sameer.shaikh is working on this issue

Comment 12 Sameer Shaikh 2022-06-09 12:03:35 UTC
Jan Safranek do we know if gcp and aws today skip the volumes with reclaim policy (Retain). I don't see any filtering out of such volumes ?

Comment 13 Sameer Shaikh 2022-06-10 06:43:27 UTC
As discussed with jsafrane we don't need to worry about the Volumes which are having reclaim policy "Retain". All volumes can be deleted

Slack discussion -- https://coreos.slack.com/archives/C01U40AM37F/p1654776475215769

As discussed with cschaefe we will follow the existing flow of not exiting the installer in case volume deletion is stuck but throw the error which will suggest how to cleanup the volumes manually.

For example -- 

"Failed to delete disk name=myvolume-update-sam, id=r026-bfdfbb30-69ec-4afc-9d10-151eef9dfce8.If this error continues to persist for more than 20 minutes then please try to manually cleanup the volume using - ibmcloud is vold r026-bfdfbb30-69ec-4afc-9d10-151eef9dfce8"

Slack discussion -- https://coreos.slack.com/archives/C01U40AM37F/p1654836366652279?thread_ts=1654767524.676699&cid=C01U40AM37F

Comment 15 Chao Yang 2022-06-28 12:47:55 UTC
Volume could be deleted after destroy cluster.
build openshift/ibm-vpc-block-csi-driver#14,openshift/ibm-vpc-node-label-updater#11,openshift/ibm-vpc-block-csi-driver-operator#40

Comment 16 Sameer Shaikh 2022-06-29 14:41:45 UTC
All PRS are merged.

Comment 19 errata-xmlrpc 2022-08-23 19:39:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069