Description of problem ====================== I did a mistake during setup of external storage system (because of BZ 2088506), which resulted in misconfigured ceph object store. An attempt to remove this cluster failed on noobaa webhook. Version-Release number of selected component ============================================ OCP 4.11.0-0.nightly-2022-05-18-171831 ODF 4.11.0-75 How reproducible ================ 1/1 Steps to Reproduce ================== 1. When creating StorageSystem, use "Connect an external storage platform" option and select "Red Hat Ceph Storage" 2. Download the ceph-external-cluster-details-exporter.py script 3. Run the exporter specifying rgw endpoint via fully qualified hostname, eg. `--rgw-endpoint ceph-5.mbukatov-ceph01.qe.example.com:8080` 4. Load the json from the exported and create the cluster 5. Observe failure described in BZ 2088506 (openshift-storage/ocs-external-storagecluster-cephobjectstore fails to reconcile) 6. Try to remove storage system to retry. Actual results ============== StorageCluster is deleting: ``` $ oc get StorageCluster -n openshift-storage NAME AGE PHASE EXTERNAL CREATED AT VERSION ocs-external-storagecluster 3h42m Deleting true 2022-05-19T14:21:06Z 4.11.0 ``` But this proces gets stucked on noobaa webhook failure: ``` 18m Warning ReconcileFailed storagesystem/ocs-external-storagecluster-storagesystem Waiting for storagecluster.ocs.openshift.io/v1 ocs-external-storagecluster to be deleted 18m Warning UninstallPending storagecluster/ocs-external-storagecluster uninstall: Failed to delete NooBaa system noobaa : admission webhook "admissionwebhook.noobaa.io" denied the request: Deletion of NooBaa resource is prohibited ``` Expected results ================ Removal of storagesystem and it's StorageCluster proceeds with success. Additional info =============== Noobaa itself is in "configuring" phase: ``` $ oc get noobaa -n openshift-storage NAME MGMT-ENDPOINTS S3-ENDPOINTS STS-ENDPOINTS IMAGE PHASE AGE noobaa ["https://10.1.160.55:31938"] ["https://10.1.160.90:32326"] ["https://10.1.160.90:32050"] quay.io/rhceph-dev/odf4-mcg-core-rhel8@sha256:c994b32b55a98deaeaae0a46d3b474299d1b5a1600ac8e622b00af0b0bca5678 Configuring 4h ``` I wonder why would noobaa webhook block removal request when the object store is broken which makes any noobaa resource present on the cluster unusable anyway. But I guess it's better to be careful. I also wonder whether there is a direct way to fix the scenario without reinstall.
This is similar to BZ 1943527 Unable to remove broken storage cluster, but usecase and a root cause seems to be different.
QE workaround (to avoid this bug): - edit noobaa CR and add field `allowNoobaaDeletion: true` in `cleanupPolicy` section - remove storage system - remove finalizers in storagecluster CR
Using OCP 4.11.0-0.nightly-2022-08-19-091806 with ODF 4.11.0-137 (RC3) with external stretched ceph 16.2.8-84.el8cp (RHCS 5.2). I performed the use case from BZ 2088506 and then I was able to remove StorageCluster without any problems.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.11.0 security, enhancement, & bugfix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:6156