Bug 2088587 - Removal of external storage system with misconfigured cephobjectstore fails on noobaa webhook
Summary: Removal of external storage system with misconfigured cephobjectstore fails o...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: Multi-Cloud Object Gateway
Version: 4.11
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ODF 4.11.0
Assignee: Utkarsh Srivastava
QA Contact: Martin Bukatovic
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-19 18:25 UTC by Martin Bukatovic
Modified: 2023-08-09 16:49 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-24 13:53:39 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github noobaa noobaa-operator pull 929 0 None Merged Add webhook validation on noobaa deletion 2022-06-14 08:16:09 UTC
Github noobaa noobaa-operator pull 936 0 None Merged [Backport to 5.11] Add webhook validation on noobaa deletion 2022-06-20 11:28:36 UTC
Github red-hat-storage ocs-operator pull 1698 0 None Merged fix noobaa uninstall process 2022-06-14 08:16:15 UTC
Github red-hat-storage ocs-operator pull 1717 0 None Merged Bug 2088587: backport to 4.11 - fix noobaa uninstall 2022-06-21 12:53:41 UTC
Red Hat Product Errata RHSA-2022:6156 0 None None None 2022-08-24 13:53:49 UTC

Description Martin Bukatovic 2022-05-19 18:25:10 UTC
Description of problem
======================

I did a mistake during setup of external storage system (because of
BZ 2088506), which resulted in misconfigured ceph object store. An attempt to
remove this cluster failed on noobaa webhook.

Version-Release number of selected component
============================================

OCP 4.11.0-0.nightly-2022-05-18-171831
ODF 4.11.0-75

How reproducible
================

1/1

Steps to Reproduce
==================

1. When creating StorageSystem, use "Connect an external storage platform"
   option and select "Red Hat Ceph Storage"
2. Download the ceph-external-cluster-details-exporter.py script
3. Run the exporter specifying rgw endpoint via fully qualified hostname, eg.
   `--rgw-endpoint ceph-5.mbukatov-ceph01.qe.example.com:8080`
4. Load the json from the exported and create the cluster
5. Observe failure described in BZ 2088506
   (openshift-storage/ocs-external-storagecluster-cephobjectstore fails to
   reconcile)
6. Try to remove storage system to retry.

Actual results
==============

StorageCluster is deleting:

```
$ oc get StorageCluster -n openshift-storage
NAME                          AGE     PHASE      EXTERNAL   CREATED AT             VERSION
ocs-external-storagecluster   3h42m   Deleting   true       2022-05-19T14:21:06Z   4.11.0
```

But this proces gets stucked on noobaa webhook failure:

```
18m         Warning   ReconcileFailed      storagesystem/ocs-external-storagecluster-storagesystem       Waiting for storagecluster.ocs.openshift.io/v1 ocs-external-storagecluster to be deleted
18m         Warning   UninstallPending     storagecluster/ocs-external-storagecluster                    uninstall: Failed to delete NooBaa system noobaa : admission webhook "admissionwebhook.noobaa.io" denied the request: Deletion of NooBaa resource is prohibited
```

Expected results
================

Removal of storagesystem and it's StorageCluster proceeds with success.

Additional info
===============

Noobaa itself is in "configuring" phase:

```
$ oc get noobaa -n openshift-storage
NAME     MGMT-ENDPOINTS                  S3-ENDPOINTS                    STS-ENDPOINTS                   IMAGE                                                                                                            PHASE         AGE
noobaa   ["https://10.1.160.55:31938"]   ["https://10.1.160.90:32326"]   ["https://10.1.160.90:32050"]   quay.io/rhceph-dev/odf4-mcg-core-rhel8@sha256:c994b32b55a98deaeaae0a46d3b474299d1b5a1600ac8e622b00af0b0bca5678   Configuring   4h
```

I wonder why would noobaa webhook block removal request when the object store
is broken which makes any noobaa resource present on the cluster unusable
anyway. But I guess it's better to be careful.

I also wonder whether there is a direct way to fix the scenario without
reinstall.

Comment 2 Martin Bukatovic 2022-05-19 18:27:08 UTC
This is similar to BZ 1943527 Unable to remove broken storage cluster, but usecase and a root cause seems to be different.

Comment 4 Martin Bukatovic 2022-05-23 16:25:07 UTC
QE workaround (to avoid this bug):

- edit noobaa CR and add field `allowNoobaaDeletion: true` in `cleanupPolicy` section
- remove storage system
- remove finalizers in storagecluster CR

Comment 10 Martin Bukatovic 2022-08-19 17:50:46 UTC
Using OCP 4.11.0-0.nightly-2022-08-19-091806 with ODF 4.11.0-137 (RC3) with external stretched ceph 16.2.8-84.el8cp (RHCS 5.2).

I performed the use case from BZ 2088506 and then I was able to remove StorageCluster without any problems.

Comment 12 errata-xmlrpc 2022-08-24 13:53:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.11.0 security, enhancement, & bugfix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:6156


Note You need to log in before you can comment on or make changes to this bug.