Description of problem ====================== When I remove OCP cluster using `openshift-install destroy cluster --dir=...`, I see that GCE Disks which were deployed during StorageCluster installation to host data of ceph OSD and MON components are still there. Version-Release number of selected component ============================================ OCP 4.6.0-0.nightly-2020-10-03-051134 OCS 4.6.0-583.ci How reproducible ================ 4/4 Steps to Reproduce ================== 1. Install OCP cluster on GCP platform 2. Install OCS (following the docs, using "faster" ssd storage class) 3. Destroy the whole cluster via `openshift-install destroy cluster --dir=...` 4. Observe GCP project where the cluster was installed Actual results ============== When I go to "Disks" page of "Compute Engine" section for the GCP project where the cluster was installed, I still see Google Compute Engine Disks for each OSD and MON the cluster were using. See attached screenshot. Expected results ================ All GCE Disks are removed after cluster destroy.
Created attachment 1719474 [details] screenshot #1: list of GCE Disks while the OCP/OCS cluster is running
Created attachment 1719475 [details] screenshot #2: list of GCE Disks after the cluster was destroyed
What's the SC class on deletion? Typically, we do not remove the OSD disks but we do remove the mons. Moving to ocs-op
The storage class used for the Mon and OSD devices is manually created prior OCS installation to allow OCS to use Azure SSD disks: ``` $ cat storageclass.yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: faster provisioner: kubernetes.io/gce-pd parameters: type: pd-ssd volumeBindingMode: WaitForFirstConsumer ``` This process is described in "Deploying and managing OpenShift Container Storage using Google Cloud" documentation, section 1.2. "Creating an OpenShift Container Storage Cluster Service in internal mode": https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/4.5/html/deploying_and_managing_openshift_container_storage_using_google_cloud/deploying-openshift-container-storage-on-google-cloud_gcp#creating-an-openshift-container-storage-service_gcp
This is not a blocker for OCS 4.6, moving to OCS 4.7. As Seb said, the OSD devices won't be deleted, that has to be done by the admin. I'm not aware of what Rook is expected to do when the mons are removed. OCS Operator has never dealt with removing any devices. Seb?
It's all about retain policy of the SC. If ocs-op does not create the SC then it's a doc improvement maybe? @Jose nothing special :).
Agreed, if the retain policy of the storage class is to delete, then we expect them to be deleted. The storage class (or the default) must have set "retain", so this is not an OCS or Rook issue except perhaps documentation.
I can confirm that when I use predefined "standard" storage class instead of the custom SSD one as currently explained in OCS docs, google disks are removed as expected during cluster teardown.
Based on the dev evaluation, and qe confirmation, moving to the documentation component. A custom storage class we instruct admins to create in our docs is to blame here.
Could we gent a dev approved fix for the "faster" storage class as currently listed in the docs? Do I read it right that we should set `reclaimPolicy` to `Delete`?
Yes, I would expect the reclaim policy to be delete. If you delete the CephCluster CR, it's going to be very difficult to recover your cluster at that point anyway.
Validation of the proposed fix: I installed CI build of OCS 4.6 manually, adding `reclaimPolicy: Delete` line into yaml definition of the ssd storage class (as defined in our by documentation[1]): ``` $ cat gcp-sc.bz-1885692.yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: faster provisioner: kubernetes.io/gce-pd parameters: type: pd-ssd volumeBindingMode: WaitForFirstConsumer reclaimPolicy: Delete $ diff gcp-sc.46.yaml gcp-sc.bz-1885692.yaml 8a9 > reclaimPolicy: Delete $ oc create -f gcp-sc.bz-1885692.yaml ``` And I see that it has the expected effect, after cluster teardown, there are no leftover google disks. [1] https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/4.6/html-single/deploying_and_managing_openshift_container_storage_using_google_cloud/index?lb_target=preview
FYI pending change in ocs-ci deployment automation: https://github.com/red-hat-storage/ocs-ci/pull/3397
Referenced preview of "Deploying and managing OpenShift Container Storage using Google Cloud" guide contains ssd-storeageclass.yaml example with reclaimPolicy set to Delete as expected. Verified.