Bug 2088587

Summary: Removal of external storage system with misconfigured cephobjectstore fails on noobaa webhook
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Martin Bukatovic <mbukatov>
Component: Multi-Cloud Object GatewayAssignee: Utkarsh Srivastava <usrivast>
Status: CLOSED ERRATA QA Contact: Martin Bukatovic <mbukatov>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.11CC: dzaken, etamir, muagarwa, ocs-bugs, odf-bz-bot
Target Milestone: ---   
Target Release: ODF 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-24 13:53:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Martin Bukatovic 2022-05-19 18:25:10 UTC
Description of problem
======================

I did a mistake during setup of external storage system (because of
BZ 2088506), which resulted in misconfigured ceph object store. An attempt to
remove this cluster failed on noobaa webhook.

Version-Release number of selected component
============================================

OCP 4.11.0-0.nightly-2022-05-18-171831
ODF 4.11.0-75

How reproducible
================

1/1

Steps to Reproduce
==================

1. When creating StorageSystem, use "Connect an external storage platform"
   option and select "Red Hat Ceph Storage"
2. Download the ceph-external-cluster-details-exporter.py script
3. Run the exporter specifying rgw endpoint via fully qualified hostname, eg.
   `--rgw-endpoint ceph-5.mbukatov-ceph01.qe.example.com:8080`
4. Load the json from the exported and create the cluster
5. Observe failure described in BZ 2088506
   (openshift-storage/ocs-external-storagecluster-cephobjectstore fails to
   reconcile)
6. Try to remove storage system to retry.

Actual results
==============

StorageCluster is deleting:

```
$ oc get StorageCluster -n openshift-storage
NAME                          AGE     PHASE      EXTERNAL   CREATED AT             VERSION
ocs-external-storagecluster   3h42m   Deleting   true       2022-05-19T14:21:06Z   4.11.0
```

But this proces gets stucked on noobaa webhook failure:

```
18m         Warning   ReconcileFailed      storagesystem/ocs-external-storagecluster-storagesystem       Waiting for storagecluster.ocs.openshift.io/v1 ocs-external-storagecluster to be deleted
18m         Warning   UninstallPending     storagecluster/ocs-external-storagecluster                    uninstall: Failed to delete NooBaa system noobaa : admission webhook "admissionwebhook.noobaa.io" denied the request: Deletion of NooBaa resource is prohibited
```

Expected results
================

Removal of storagesystem and it's StorageCluster proceeds with success.

Additional info
===============

Noobaa itself is in "configuring" phase:

```
$ oc get noobaa -n openshift-storage
NAME     MGMT-ENDPOINTS                  S3-ENDPOINTS                    STS-ENDPOINTS                   IMAGE                                                                                                            PHASE         AGE
noobaa   ["https://10.1.160.55:31938"]   ["https://10.1.160.90:32326"]   ["https://10.1.160.90:32050"]   quay.io/rhceph-dev/odf4-mcg-core-rhel8@sha256:c994b32b55a98deaeaae0a46d3b474299d1b5a1600ac8e622b00af0b0bca5678   Configuring   4h
```

I wonder why would noobaa webhook block removal request when the object store
is broken which makes any noobaa resource present on the cluster unusable
anyway. But I guess it's better to be careful.

I also wonder whether there is a direct way to fix the scenario without
reinstall.

Comment 2 Martin Bukatovic 2022-05-19 18:27:08 UTC
This is similar to BZ 1943527 Unable to remove broken storage cluster, but usecase and a root cause seems to be different.

Comment 4 Martin Bukatovic 2022-05-23 16:25:07 UTC
QE workaround (to avoid this bug):

- edit noobaa CR and add field `allowNoobaaDeletion: true` in `cleanupPolicy` section
- remove storage system
- remove finalizers in storagecluster CR

Comment 10 Martin Bukatovic 2022-08-19 17:50:46 UTC
Using OCP 4.11.0-0.nightly-2022-08-19-091806 with ODF 4.11.0-137 (RC3) with external stretched ceph 16.2.8-84.el8cp (RHCS 5.2).

I performed the use case from BZ 2088506 and then I was able to remove StorageCluster without any problems.

Comment 12 errata-xmlrpc 2022-08-24 13:53:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.11.0 security, enhancement, & bugfix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:6156