Bug 2069389 - Consumer cluster deletion succeeds(even with existing PVCs) but provider still lists the storageconsumer and related resources [NEEDINFO]
Summary: Consumer cluster deletion succeeds(even with existing PVCs) but provider stil...
Keywords:
Status: VERIFIED
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: odf-managed-service
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Dhruv Bindra
QA Contact: suchita
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-03-28 20:09 UTC by Neha Berry
Modified: 2023-08-09 17:00 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:
nberry: needinfo? (kbader)


Attachments (Terms of Use)
auth list showing 3 onboarded clients (6.27 KB, text/plain)
2022-03-28 20:09 UTC, Neha Berry
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage ocs-osd-deployer pull 152 0 None open Uninstall will wait untill all PVs using OCS storageClasses are removed 2022-03-30 04:38:40 UTC
Red Hat Issue Tracker SDA-5896 0 None None None 2022-05-09 13:09:21 UTC

Description Neha Berry 2022-03-28 20:09:05 UTC
Created attachment 1868837 [details]
auth list showing 3 onboarded clients

Description of problem:
=====================================
To offboard a consumer cluster from provider, we have tested uninstall add-on whereby the storageconsumers and other resources are deleted from provider

However, when we delete an onboarded consumer directly from OCM UI(OCM UI-> Delete cluster), the offboarding is not kicked in and the storageconsumer and other resources still exist

It is to be noted, as part of cluster deletion, the UI shows add-on in "Uninstalling phase" first. So the offboarding should have happened


Version-Release number of selected component (if applicable):
=================================================================
provider images
==================
 oc describe csv ocs-osd-deployer.v2.0.0 |grep -i image
    Mediatype:   image/svg+xml
                Image:  gcr.io/kubebuilder/kube-rbac-proxy:v0.5.0
                Image:             quay.io/osd-addons/ocs-osd-deployer:2.0.0-2
                Image:             quay.io/osd-addons/ocs-osd-deployer:2.0.0-2
 oc get csv -n openshift-storage -o json ocs-operator.v4.10.0 | jq '.metadata.labels["full_version"]'
"4.10.0-206"

consumer
==========
oc describe csv ocs-osd-deployer.v2.0.0 |grep -i image                                              
    Mediatype:   image/svg+xml
                Image:  gcr.io/kubebuilder/kube-rbac-proxy:v0.5.0
                Image:             quay.io/osd-addons/ocs-osd-deployer:2.0.0-5
                Image:             quay.io/osd-addons/ocs-osd-deployer:2.0.0-5


How reproducible:
======================
Tested on 3 consumer clusters on the same ODF to ODF setup

Steps to Reproduce:
1. Created an ODF to ODF cluster using the steps provided here[1]
[1] -https://docs.google.com/document/d/1ehNBscWgLGNYqnnZUp6RPnkR9ByYU69BgXvr_z2n5sE/edit#heading=h.41dqse7bmiv5

2. Onboarded 3 consumers and created PVCs from each of the 3
3. With existing PVCs, started Delete cluster from OCM UI , expecting it to fail since PVCs existed
3. 

Actual results:
===================
1. consumer Cluster deleteion succeeds
2. On provider side, all 3 storageconsumers and corresponding resurces are still intact



Expected results:
=====================
If consumer cluster is permanently deleted from OCM UI, corresponding provider resources should also be deleted to free up the space

Or proivider should have a mechanism to poll and delete the storageconsumers?



Additional info:=

After deletion from UI, Only provider exists


rosa list clusters
ID                                NAME            STATE
1r7eu5ehl2m97pad70lecuk9uljcmaa4  sgatfane-28pr1  ready

However corresponding stporageconsumer reosurces are not deleted
-----------------------------------
date --utc; oc get storageconsumer,cephblockpool,cephfilesystemsubvolumegroup
Mon Mar 28 08:07:23 PM UTC 2022
NAME                                                                                    AGE
storageconsumer.ocs.openshift.io/storageconsumer-0acdb819-f6bb-42ba-b80a-c0e9bd88a40e   6h3m
storageconsumer.ocs.openshift.io/storageconsumer-223e3a6c-205d-40ce-861e-2021a7ac4b62   6h20m
storageconsumer.ocs.openshift.io/storageconsumer-99d4f738-4241-401c-9b40-ba7eaafba343   6h11m

NAME                                                                                            AGE
cephblockpool.ceph.rook.io/cephblockpool-storageconsumer-0acdb819-f6bb-42ba-b80a-c0e9bd88a40e   6h3m
cephblockpool.ceph.rook.io/cephblockpool-storageconsumer-223e3a6c-205d-40ce-861e-2021a7ac4b62   6h20m
cephblockpool.ceph.rook.io/cephblockpool-storageconsumer-99d4f738-4241-401c-9b40-ba7eaafba343   6h11m
cephblockpool.ceph.rook.io/ocs-storagecluster-cephblockpool                                     14h

NAME                                                                                                                          AGE
cephfilesystemsubvolumegroup.ceph.rook.io/cephfilesystemsubvolumegroup-storageconsumer-0acdb819-f6bb-42ba-b80a-c0e9bd88a40e   6h4m
cephfilesystemsubvolumegroup.ceph.rook.io/cephfilesystemsubvolumegroup-storageconsumer-223e3a6c-205d-40ce-861e-2021a7ac4b62   6h20m
cephfilesystemsubvolumegroup.ceph.rook.io/cephfilesystemsubvolumegroup-storageconsumer-99d4f738-4241-401c-9b40-ba7eaafba343   6h11m







-----------------------


Before deletion
------------------
rosa list clusters
ID                                NAME            STATE
1r7eu5ehl2m97pad70lecuk9uljcmaa4  sgatfane-28pr1  ready
1r7ieje1qbl46k94cposunuijk81ksvf  sgatfane-28c3   ready
1r7kcdde31vldf7oi4e3r91f8jimbapl  jijoyc1         ready
1r7lbvnvpqbdlqmrro6uncn6aoldemm6  jijoyc2         ready


sh-4.4$ ceph fs subvolume ls ocs-storagecluster-cephfilesystem cephfilesystemsubvolumegroup-storageconsumer-99d4f738-4241-401c-9b40-ba7eaafba343
[
    {
        "name": "csi-vol-a240d41d-aea6-11ec-ae4e-0a580a800220"
    }
]
sh-4.4$ ceph fs subvolume ls ocs-storagecluster-cephfilesystem cephfilesystemsubvolumegroup-storageconsumer-223e3a6c-205d-40ce-861e-2021a7ac4b62
[
    {
        "name": "csi-vol-5bcd36c7-aebc-11ec-804c-0a580a83003c"
    },
    {
        "name": "csi-vol-5be9cf1d-aebc-11ec-804c-0a580a83003c"
    }
]
sh-4.4$ ceph fs subvolume ls ocs-storagecluster-cephfilesystem cephfilesystemsubvolumegroup-storageconsumer-0acdb819-f6bb-42ba-b80a-c0e9bd88a40e
[]


sh-4.4$ rbd ls -p cephblockpool-storageconsumer-99d4f738-4241-401c-9b40-ba7eaafba343
csi-vol-8f1e897a-aeaf-11ec-9f4e-0a580a80021f
csi-vol-c2e87e4a-aeb6-11ec-9f4e-0a580a80021f
csi-vol-c51c6faa-aeae-11ec-9f4e-0a580a80021f
csi-vol-cdf10ef2-aeb3-11ec-9f4e-0a580a80021f
csi-vol-e8749ded-aead-11ec-9f4e-0a580a80021f
sh-4.4$ rbd ls -p cephblockpool-storageconsumer-0acdb819-f6bb-42ba-b80a-c0e9bd88a40e
csi-vol-1a7800cd-aeb0-11ec-8556-0a580a830025
csi-vol-c2d9b816-aeb6-11ec-8556-0a580a830025
csi-vol-cdcbdd4e-aeb3-11ec-8556-0a580a830025
csi-vol-e872570b-aead-11ec-8556-0a580a830025
csi-vol-f6612c51-aeb0-11ec-8556-0a580a830025
csi-vol-fc78b9d7-aeb1-11ec-8556-0a580a830025
sh-4.4$ rbd ls -p cephblockpool-storageconsumer-223e3a6c-205d-40ce-861e-2021a7ac4b62
csi-vol-4baf67df-aeb8-11ec-9c01-0a580a83003b

Comment 2 Dhruv Bindra 2022-03-30 04:38:40 UTC
I tried to reproduce the bug but I was not able to
Few things which I found:
Your provider cluster is using an older image of deployer: quay.io/osd-addons/ocs-osd-deployer:2.0.0-2 and your consumer cluster is using a new image of deployer: quay.io/osd-addons/ocs-osd-deployer:2.0.0-5
The doc that you are using has some steps that do not need to be followed now as the deployer was updated, I have added comments to the doc.

The behavior I observed while reproducing the bug:
Deployer doesn't allow uninstallation if PVCs are using OCS storage classes.
As soon as I delete the PVCs using OCS storage classes, the consumer offboarding starts, and the complete openshift-storage namespace is deleted in some time.
When I checked the Provider cluster for storageConsumer resource, the resource still exists for the consumer that was offboarded.
After debugging found that the PV was utilizing storage on the consumer cluster using cephfs storage class(this PV was created when I created PVCs with OCS storage class i.e. cephrbd and cephfs)
When I manually deleted the PV on the consumer and deleted the corresponding subvolume on the provider, the storageConsumer resource was removed.

So the deployer needs to uninstall when there is no PV using OCS StorageClasses instead of PVC, I have raised PR for that: https://github.com/red-hat-storage/ocs-osd-deployer/pull/152

Comment 6 Subham Rai 2022-05-09 10:01:51 UTC
If I understand correctly, the problem is when we are deleting the entire consumer instead of following offboarding process, the consumer on the provider side still exists.

IIRC, there will stale blockPools, filesystem and cephClient(confirmed with Druv).

We need to delete
1. blockPool, in this case `cephblockpool-storageconsumer-326dfd52-773c-4c72-ac1c-6576380bfe37   10d` notice the block pool name has storageconsumer name in the end.
2. filesystem, `cephfilesystemsubvolumegroup-storageconsumer-326dfd52-773c-4c72-ac1c-6576380bfe37   10d` notice the block pool name has storageconsumer name in the end.
3. cephClients, for deleting the cephClients linked to perticluar consumer, you list cephClient and check the `annotation` of cephClients with key `StorageConsumerAnnotation` which will have consumer name.

Comment 10 Dhruv Bindra 2022-09-20 07:26:58 UTC
Moving the BZ to ON_QA as the tracker issue and fix in deployer were merged.

Comment 17 suchita 2023-05-08 08:15:39 UTC
This is verified on the earlier Deployer version (v2.0.10). 
No storageconsumer observed if we deleted the PVC from consumer and openshift-storage project deleted successfully from consumer.


Note You need to log in before you can comment on or make changes to this bug.