2088587 – Removal of external storage system with misconfigured cephobjectstore fails on noobaa webhook

Bug 2088587 - Removal of external storage system with misconfigured cephobjectstore fails on noobaa webhook

Summary: Removal of external storage system with misconfigured cephobjectstore fails o...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	Multi-Cloud Object Gateway
Sub Component:
Version:	4.11
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	ODF 4.11.0
Assignee:	Utkarsh Srivastava
QA Contact:	Martin Bukatovic
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-05-19 18:25 UTC by Martin Bukatovic
Modified:	2023-08-09 16:49 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-08-24 13:53:39 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	noobaa noobaa-operator pull 929	None	Merged	Add webhook validation on noobaa deletion	2022-06-14 08:16:09 UTC
Github	noobaa noobaa-operator pull 936	None	Merged	[Backport to 5.11] Add webhook validation on noobaa deletion	2022-06-20 11:28:36 UTC
Github	red-hat-storage ocs-operator pull 1698	None	Merged	fix noobaa uninstall process	2022-06-14 08:16:15 UTC
Github	red-hat-storage ocs-operator pull 1717	None	Merged	Bug 2088587: backport to 4.11 - fix noobaa uninstall	2022-06-21 12:53:41 UTC
Red Hat Product Errata	RHSA-2022:6156	None	None	None	2022-08-24 13:53:49 UTC

Description Martin Bukatovic 2022-05-19 18:25:10 UTC

Description of problem
======================

I did a mistake during setup of external storage system (because of
BZ 2088506), which resulted in misconfigured ceph object store. An attempt to
remove this cluster failed on noobaa webhook.

Version-Release number of selected component
============================================

OCP 4.11.0-0.nightly-2022-05-18-171831
ODF 4.11.0-75

How reproducible
================

1/1

Steps to Reproduce
==================

1. When creating StorageSystem, use "Connect an external storage platform"
   option and select "Red Hat Ceph Storage"
2. Download the ceph-external-cluster-details-exporter.py script
3. Run the exporter specifying rgw endpoint via fully qualified hostname, eg.
   `--rgw-endpoint ceph-5.mbukatov-ceph01.qe.example.com:8080`
4. Load the json from the exported and create the cluster
5. Observe failure described in BZ 2088506
   (openshift-storage/ocs-external-storagecluster-cephobjectstore fails to
   reconcile)
6. Try to remove storage system to retry.

Actual results
==============

StorageCluster is deleting:

```
$ oc get StorageCluster -n openshift-storage
NAME                          AGE     PHASE      EXTERNAL   CREATED AT             VERSION
ocs-external-storagecluster   3h42m   Deleting   true       2022-05-19T14:21:06Z   4.11.0
```

But this proces gets stucked on noobaa webhook failure:

```
18m         Warning   ReconcileFailed      storagesystem/ocs-external-storagecluster-storagesystem       Waiting for storagecluster.ocs.openshift.io/v1 ocs-external-storagecluster to be deleted
18m         Warning   UninstallPending     storagecluster/ocs-external-storagecluster                    uninstall: Failed to delete NooBaa system noobaa : admission webhook "admissionwebhook.noobaa.io" denied the request: Deletion of NooBaa resource is prohibited
```

Expected results
================

Removal of storagesystem and it's StorageCluster proceeds with success.

Additional info
===============

Noobaa itself is in "configuring" phase:

```
$ oc get noobaa -n openshift-storage
NAME     MGMT-ENDPOINTS                  S3-ENDPOINTS                    STS-ENDPOINTS                   IMAGE                                                                                                            PHASE         AGE
noobaa   ["https://10.1.160.55:31938"]   ["https://10.1.160.90:32326"]   ["https://10.1.160.90:32050"]   quay.io/rhceph-dev/odf4-mcg-core-rhel8@sha256:c994b32b55a98deaeaae0a46d3b474299d1b5a1600ac8e622b00af0b0bca5678   Configuring   4h
```

I wonder why would noobaa webhook block removal request when the object store
is broken which makes any noobaa resource present on the cluster unusable
anyway. But I guess it's better to be careful.

I also wonder whether there is a direct way to fix the scenario without
reinstall.

Comment 2 Martin Bukatovic 2022-05-19 18:27:08 UTC

This is similar to BZ 1943527 Unable to remove broken storage cluster, but usecase and a root cause seems to be different.

Comment 4 Martin Bukatovic 2022-05-23 16:25:07 UTC

QE workaround (to avoid this bug):

- edit noobaa CR and add field `allowNoobaaDeletion: true` in `cleanupPolicy` section
- remove storage system
- remove finalizers in storagecluster CR

Comment 10 Martin Bukatovic 2022-08-19 17:50:46 UTC

Using OCP 4.11.0-0.nightly-2022-08-19-091806 with ODF 4.11.0-137 (RC3) with external stretched ceph 16.2.8-84.el8cp (RHCS 5.2).

I performed the use case from BZ 2088506 and then I was able to remove StorageCluster without any problems.

Comment 12 errata-xmlrpc 2022-08-24 13:53:39 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.11.0 security, enhancement, & bugfix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:6156

Note You need to log in before you can comment on or make changes to this bug.