1927782 – With graceful mode, storagecluster/cephcluster deletion should be blocked if OBC based on RGW SC still exists

Bug 1927782 - With graceful mode, storagecluster/cephcluster deletion should be blocked if OBC based on RGW SC still exists

Summary: With graceful mode, storagecluster/cephcluster deletion should be blocked if ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	rook
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	ODF 4.9.0
Assignee:	Blaine Gardner
QA Contact:	Anna Sandler
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-02-11 14:09 UTC by Neha Berry
Modified:	2023-08-09 17:03 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-12-13 17:44:30 UTC
Embargoed:

Attachments	(Terms of Use)
rook operator log (1.87 MB, text/plain) 2021-02-11 14:09 UTC, Neha Berry	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2021:5086	0	None	None	None	2021-12-13 17:44:43 UTC

Description Neha Berry 2021-02-11 14:09:22 UTC

Created attachment 1756372 [details]
rook operator log

Description of problem (please be detailed as possible and provide log
snippests):
=======================================================================
In an OCS dynamic mode cluster on vmware, created 2 OBCs and then initiated storagecluster deletion(without deleting the OBCs to test if it blocks uninstall).

Working as expected:until I deleted the Noobaa based OBC, noobaa deletion was stuck, hence halting storagecluster deletion.  Once I deleted only the noobaa OBC, the uninstall progressed towatds cephcluster deletion

>>Not working as expected: Even with mode:graceful, storagecluster and cephcluster deletion succeeded, though the RGW based OBC still existed on the cluster. rook should have checked for its existence and blocked cephcluster deletion, similar to the way it blocks cephcluster deletion when PVCs exist(cephfs/rbd)


Storagecluster annotation:
-----------------------------
  annotations:
      uninstall.ocs.openshift.io/cleanup-policy: delete
      uninstall.ocs.openshift.io/mode: graceful



======= obc ==========
NAME      STORAGE-CLASS                 PHASE   AGE
obc-rgw   ocs-storagecluster-ceph-rgw   Bound   7s
obcs-nb   openshift-storage.noobaa.io   Bound   45s

Because of presence of OBC, the namespace deletion got stuck and I had to manually patch the finalizers of the OBC related CM, secret and the OBC itself.


Version of all relevant components (if applicable):
===============================================
OCS = 4.7.0-258.ci
OCP = 4.7.0-0.nightly-2021-02-09-224509

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
=================================================================
Due to leftovers, We have to manually patch the finalizers in OBC resources for namespace deletion to succeed

Is there any workaround available to the best of your knowledge?
=========================================================
NA

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?
==============================
Tested once

Can this issue reproduce from the UI?
=====================================
NA

If this is a regression, please provide more details to justify this:
===================================================
Not sure

Steps to Reproduce:
=======================
1. Created one RGW and one noobaa based OBC using UI
2. Keeping them intact, initiate storagecluster deletion

$ date --utc; oc delete storagecluster --all ; date --utc
Thu Feb 11 11:26:44 UTC 2021
storagecluster.ocs.openshift.io "ocs-storagecluster" deleted
Thu Feb 11 12:06:43 UTC 2021

3. Storagecluster deletion stays stuck in Deleting state since Noobaa is waiting for the OBC bucket to be removed

4. Delete only the noobaa OBC from UI/CLI. Noobaa deletion succeeds and uninstall progresses to delete the cephcluster

5. Check the progress of storagecluster deletion. cephcluster deletion does not get stuck for RGW OBC


Actual results:
=================
With graceful mode ON, Storagecluster(aka cephcluster) deletion succeeds even when RGW based OBC exists


Expected results:
===================
cephcluster deletion should have stuck and waited for user to delete the RGW OBC first.

Additional info:
===================

Only remaining resources after storagecluster deletion
-------------------------------------------------

$ oc get cm
NAME      DATA   AGE
obc-rgw   5      133m

[nberry@localhost dynamic-258.ci]$ oc get secret
NAME      TYPE     DATA   AGE
obc-rgw   Opaque   2      134m

$oc get obc
NAME      STORAGE-CLASS                 PHASE   AGE
obc-rgw   ocs-storagecluster-ceph-rgw   Bound   43m



--------------
======= storagecluster ==========
No resources found in openshift-storage namespace.
--------------
======= cephcluster ==========
No resources found in openshift-storage namespace.
======= PV ====
No resources found
======= backingstore ==========
No resources found in openshift-storage namespace.
======= bucketclass ==========
No resources found in openshift-storage namespace.
======= obc ==========
NAME      STORAGE-CLASS                 PHASE   AGE
obc-rgw   ocs-storagecluster-ceph-rgw   Bound   43m
======= oc get cephobjectstore and user ==========
No resources found in openshift-storage namespace.

Comment 3 Travis Nielsen 2021-02-11 17:25:05 UTC

Since not a regression, only related to uninstall, and there is a workaround, moving to 4.8.

Comment 4 Travis Nielsen 2021-05-17 15:50:24 UTC

Per discussion on upstream design doc for cleanup, this would have potential breaking behavior change so we need to wait for an upstream minor release v1.7 to make this change. Thus, moving to 4.9.

Comment 5 Blaine Gardner 2021-05-24 15:57:21 UTC

Upstream design doc to solve this issue as well as a related class of issues: https://github.com/rook/rook/pull/7885

Comment 6 Sébastien Han 2021-07-12 08:08:19 UTC

Branch https://github.com/red-hat-data-services/rook/tree/release-4.9 has the fix from a resync.
Moving to MODIFIED.

Comment 9 Anna Sandler 2021-08-24 00:35:05 UTC

created two OBCs via UI 
-------------------------------------------------------------------
[asandler@fedora ~]$ oc get obc -A
No resources found
[asandler@fedora ~]$ oc get obc -A
NAMESPACE   NAME   STORAGE-CLASS                 PHASE   AGE
default     obc1   ocs-storagecluster-ceph-rgw   Bound   15s
default     obc2   openshift-storage.noobaa.io   Bound   3s


deleting storagecluster
-------------------------------------------------------------
[asandler@fedora ~]$ oc delete storagecluster ocs-storagecluster -n openshift-storage
storagecluster.ocs.openshift.io "ocs-storagecluster" deleted

[asandler@fedora ~]$ oc get storagecluster -A
NAMESPACE           NAME                 AGE     PHASE      EXTERNAL   CREATED AT             VERSION
openshift-storage   ocs-storagecluster   4h15m   Deleting              2021-08-23T20:02:18Z   4.8.0          -----> stuck on deleting 





deleting noobaa OBC
-----------------------------------------------------------
[asandler@fedora ~]$ oc get obc -A
NAMESPACE   NAME   STORAGE-CLASS                 PHASE   AGE
default     obc1   ocs-storagecluster-ceph-rgw   Bound   12m

[asandler@fedora ~]$ oc get storagecluster -A
NAMESPACE           NAME                 AGE     PHASE      EXTERNAL   CREATED AT             VERSION
openshift-storage   ocs-storagecluster   4h17m   Deleting              2021-08-23T20:02:18Z   4.8.0          -------> still stuck 



deleting RGW OBC
------------------------------------------------------------------
[asandler@fedora ~]$ oc get obc -A
No resources found

[asandler@fedora ~]$ oc get storagecluster -A
No resources found

* PVCs were deleted too to prevent storagecluster being stuck because of them 


moving to verified

Comment 15 errata-xmlrpc 2021-12-13 17:44:30 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Data Foundation 4.9.0 enhancement, security, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:5086

Note You need to log in before you can comment on or make changes to this bug.