Bug 1886873 - [OCS 4.6 External/Internal Uninstall] - Storage Cluster deletion stuck indefinitely, "failed to delete object store", remaining users: [noobaa-ceph-objectstore-user]
Summary: [OCS 4.6 External/Internal Uninstall] - Storage Cluster deletion stuck indefi...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: OCS 4.6.0
Assignee: Raghavendra Talur
QA Contact: Neha Berry
URL:
Whiteboard:
Depends On:
Blocks: 1860670
TreeView+ depends on / blocked
 
Reported: 2020-10-09 15:06 UTC by Neha Berry
Modified: 2021-06-01 08:51 UTC (History)
14 users (show)

Fixed In Version: 4.6.0-144.ci
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-12-17 06:24:47 UTC
Embargoed:


Attachments (Terms of Use)
rook-logs (299.38 KB, text/plain)
2020-10-28 19:02 UTC, Neha Berry
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2020:5605 0 None None None 2020-12-17 06:25:07 UTC

Description Neha Berry 2020-10-09 15:06:43 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1885676


Description of problem (please be detailed as possible and provide log
snippests):
---------------------------------------------------------------------
External Mode: OCS uninstall stuck with following error message. The storage cluster deletion is stuck


rook-operator logs snip
=====================
2020-10-09 10:50:06.611358 E | ceph-object-controller: failed to delete object store. users for objectstore "ocs-external-storagecluster-cephobjectstore" in namespace "openshift-storage" are not cleaned up. remaining users: [noobaa-ceph-objectstore-user]

ocs-operator log snip
=========================
2020-10-09T10:40:31.859672296Z {"level":"info","ts":"2020-10-09T10:40:31.859Z","logger":"controller_storagecluster","msg":"Uninstall: CephObjectStoreUser not found","Request.Namespace":"openshift-storage","Request.Name":"ocs-external-storagecluster","CephObjectStoreUser Name":"ocs-external-storagecluster-cephobjectstoreuser"}
2020-10-09T10:40:31.859679762Z {"level":"info","ts":"2020-10-09T10:40:31.859Z","logger":"controller_storagecluster","msg":"Uninstall: Deleting cephObjectStore","Request.Namespace":"openshift-storage","Request.Name":"ocs-external-storagecluster","CephObjectStore Name":"ocs-external-storagecluster-cephobjectstore"}
2020-10-09T10:40:31.878084801Z {"level":"error","ts":"2020-10-09T10:40:31.878Z","logger":"controller-runtime.controller","msg":"Reconciler error","controller":"storagecluster-controller","request":"openshift-storage/ocs-external-storagecluster","error":"Uninstall: Waiting for cephObjectStore ocs-external-storagecluster-cephobjectstore to be deleted","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/remote-source/app/vendor/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:258\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}



Note: The cluster was in good shape before uninstall was triggered.

.
Version of all relevant components (if applicable):
-------------------------------------------------------

OCP = 4.7.0-0.ci-2020-10-09-055453

OCS = ocs-operator.v4.6.0-590.ci (ocs-registry:4.6.0-119.ci) - Last build which passed OCS-CI acceptance tests


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
-------------------------------------------------------------

Yes. Unable to proceed with uninstall, which blocks re-install.

Is there any workaround available to the best of your knowledge?
-----------------------------------------------------------------
Not sure


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
------------------------------------------------------------------
3

Can this issue reproducible?
------------------------------
tested once on this OCS build.

Can this issue reproduce from the UI?
--------------------------------------
NA

If this is a regression, please provide more details to justify this:
---------------------------------------------------------------------
Uninstall feature has undergone changes in OCS 4.6

Steps to Reproduce:
---------------------------
1. Create an OCS external mode cluster. the cluster is in Connected state
2. Trigger OCS uninstall
   a) Delete all PVCs/OBCs 
   b) Trigger OCS uninstall by deleting the storage cluster from UI or CLI. The default annotations were not changed.

      UI -> Installed Operators->OCS-> Storage Cluster-> ocs-external-storagecluster-> Delete Storage cluster


Actual results:
--------------------
Storage cluster deletion is stuck as cephobjectsore deletion is not succeeding

2020-10-09 10:50:06.611358 E | ceph-object-controller: failed to delete object store. users for objectstore "ocs-external-storagecluster-cephobjectstore" in namespace "openshift-storage" are not cleaned up. remaining users: [noobaa-ceph-objectstore-user]



Expected results:
--------------------

Uninstall should clean up all resources.


Additional info:
--------------------

Fri Oct  9 10:49:59 UTC 2020
--------------
========CSV ======
NAME                         DISPLAY                       VERSION        REPLACES                     PHASE
ocs-operator.v4.6.0-593.ci   OpenShift Container Storage   4.6.0-593.ci   ocs-operator.v4.6.0-590.ci   Succeeded
--------------
=======PODS ======
NAME                                            READY   STATUS    RESTARTS   AGE   IP             NODE        NOMINATED NODE   READINESS GATES
csi-cephfsplugin-fflxv                          3/3     Running   0          63m   10.1.160.96    compute-0   <none>           <none>
csi-cephfsplugin-provisioner-7d5fb4d7cd-kv8mb   6/6     Running   0          31m   10.128.2.10    compute-1   <none>           <none>
csi-cephfsplugin-provisioner-7d5fb4d7cd-qxnpj   6/6     Running   0          38m   10.131.0.6     compute-0   <none>           <none>
csi-cephfsplugin-td4jk                          3/3     Running   0          63m   10.1.160.149   compute-2   <none>           <none>
csi-cephfsplugin-wk72s                          3/3     Running   0          63m   10.1.160.76    compute-1   <none>           <none>
csi-rbdplugin-2zpf6                             3/3     Running   0          63m   10.1.160.96    compute-0   <none>           <none>
csi-rbdplugin-krpbg                             3/3     Running   0          63m   10.1.160.76    compute-1   <none>           <none>
csi-rbdplugin-provisioner-54ff9fbd95-mljks      6/6     Running   0          38m   10.131.0.7     compute-0   <none>           <none>
csi-rbdplugin-provisioner-54ff9fbd95-z2wkw      6/6     Running   0          31m   10.128.2.9     compute-1   <none>           <none>
csi-rbdplugin-tjw8n                             3/3     Running   0          63m   10.1.160.149   compute-2   <none>           <none>
noobaa-operator-7b9d89779f-xp42l                1/1     Running   0          31m   10.128.2.7     compute-1   <none>           <none>
ocs-metrics-exporter-79cbfc99d9-p4kfr           1/1     Running   0          31m   10.131.0.17    compute-0   <none>           <none>
ocs-operator-68db4bfc8d-zzwmw                   1/1     Running   0          31m   10.128.2.5     compute-1   <none>           <none>
rook-ceph-operator-59fcc7f5cc-x6q4h             1/1     Running   0          38m   10.131.0.5     compute-0   <none>           <none>
--------------
======= PVC ==========
No resources found in openshift-storage namespace.
--------------
======= storagecluster ==========
NAME                          AGE   PHASE      EXTERNAL   CREATED AT             VERSION
ocs-external-storagecluster   19h   Deleting   true       2020-10-08T15:23:57Z   4.6.0
--------------
======= cephcluster ==========
NAME                                      DATADIRHOSTPATH   MONCOUNT   AGE   PHASE       MESSAGE                          HEALTH
ocs-external-storagecluster-cephcluster                                19h   Connected   Cluster connected successfully   HEALTH_OK
======= PV ====
No resources found
======= backingstore ==========
No resources found in openshift-storage namespace.
======= bucketclass ==========
No resources found in openshift-storage namespace.
======= obc ==========
No resources found in openshift-storage namespace.


Storagecluster.yaml = ./quay-io-rhceph-dev-ocs-must-gather-sha256-6d8aab40e985fb3e08836349018e833fe489397a27a1fd4e9326a81e2cc54373/namespaces/openshift-storage/oc_output/storagecluster.yaml

Noobaa operator log = quay-io-rhceph-dev-ocs-must-gather-sha256-6d8aab40e985fb3e08836349018e833fe489397a27a1fd4e9326a81e2cc54373/namespaces/openshift-storage/pods/noobaa-operator-7b9d89779f-xp42l/noobaa-operator/noobaa-operator/logs/current.log

Comment 6 Santosh Pillai 2020-10-12 10:12:53 UTC
Seeing the same behavior with `internal-attached devices`

Deleted the user from the toolbox pod using the `--purge-data` and the user got deleted after a few minutes and uninstall was completed. 

ToolBox:

```
$ oc exec -it rook-ceph-tools-78cdfd976c-q5lhv bash 
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead.
bash-4.4$ radosgw-admin 
radosgw-admin: -h or --help for usage
bash-4.4$ radosgw-admin  user rm --uid=noobaa-ceph-objectstore-user
could not remove user: unable to remove user, must specify purge data to remove user with buckets
bash-4.4$ radosgw-admin  user rm --uid=noobaa-ceph-objectstore-user --purge-data
bash-4.4$ radosgw-admin  user rm --uid=noobaa-ceph-objectstore-user --purge-data
could not remove user: unable to remove user, user does not exist
bash-4.4$ 
```

Snippet from operator logs:
```
2020-10-12 10:04:41.286377 I | op-mon: parsing mon endpoints: a=172.30.14.78:6789,b=172.30.38.79:6789,c=172.30.169.120:6789
2020-10-12 10:04:41.288213 I | op-k8sutil: ROOK_OBC_WATCH_OPERATOR_NAMESPACE="true" (env var)
2020-10-12 10:04:41.289973 I | ceph-object-controller: no buckets found for objectstore "ocs-storagecluster-cephobjectstore" in namespace "openshift-storage"
2020-10-12 10:04:41.291770 E | ceph-object-controller: failed to delete object store. users for objectstore "ocs-storagecluster-cephobjectstore" in namespace "openshift-storage" are not cleaned up. remaining users: [noobaa-ceph-objectstore-user]
2020-10-12 10:04:48.676171 I | op-mon: parsing mon endpoints: a=172.30.14.78:6789,b=172.30.38.79:6789,c=172.30.169.120:6789
2020-10-12 10:04:48.676242 I | ceph-object-store-user-controller: CephObjectStore "ocs-storagecluster-cephobjectstore" found
2020-10-12 10:04:48.676461 I | ceph-object-store-user-controller: CephObjectStore "ocs-storagecluster-cephobjectstore" found
2020-10-12 10:04:48.761842 I | ceph-object-store-user-controller: ceph object user "noobaa-ceph-objectstore-user" deleted successfully
2020-10-12 10:04:48.761860 I | ceph-spec: removing finalizer "cephobjectstoreuser.ceph.rook.io" on "noobaa-ceph-objectstore-user"
2020-10-12 10:04:48.789423 I | ceph-spec: object "rook-ceph-object-user-ocs-storagecluster-cephobjectstore-noobaa-ceph-objectstore-user" matched on delete, reconciling
2020-10-12 10:04:51.297127 I | op-mon: parsing mon endpoints: a=172.30.14.78:6789,b=172.30.38.79:6789,c=172.30.169.120:6789

```

Comment 7 Santosh Pillai 2020-10-12 10:39:09 UTC
@Travis couple of questions:

1. Is it safe to call --purge-data for deleting object users in rook?  
   Say,
     - If cleanup policy is set, then delete object user with --purge-data flag.

2. Instead, should Nooba be deleting these user buckets if it has created them in the first place?

@Romy: Please see question 2 above.

Comment 8 Travis Nielsen 2020-10-12 14:26:53 UTC
Rook has the expected behavior of blocking the user removal if there are any buckets associated with the user. How was the bucket created? Was it not with an OBC? If the OBC was deleted, the bucket should be deleted, and then the user would be deleted. But if the bucket hasn't deleted, it's dangerous to always purge the bucket when deleting the user. 

If the yes-really-destroy-data policy is set on the cluster CR, agreed that we can go ahead and purge the user.

Comment 9 Sébastien Han 2020-10-12 15:05:27 UTC
I believe there might be a sequencing issue here, Talur, what's the deletion sequence on the ocs-op? Which resources does get deleted first?
If we want to remove everything, the CephCluster CR should be deleted first.

Thanks

@talur

Comment 10 Santosh Pillai 2020-10-13 04:25:15 UTC
@leseb Deletion Sequence in ocs: https://github.com/openshift/ocs-operator/blob/master/pkg/controller/storagecluster/uninstall_reconciler.go#L328
1. set uninstall policy to rook
2. Set uninstall policy to nooba
3. delete nooba systems 
4. delete ceph object store users
5. delete ceph object stores
6. delete ceph file systems 
7. delete ceph block pools. 
8. delete ceph cluster 
9. delete snapshot classes
10. delete storage classes
11. delete node taints.

Comment 11 Santosh Pillai 2020-10-13 06:32:19 UTC
(In reply to Travis Nielsen from comment #8)
> Rook has the expected behavior of blocking the user removal if there are any
> buckets associated with the user. How was the bucket created? Was it not
> with an OBC? If the OBC was deleted, the bucket should be deleted, and then
> the user would be deleted. But if the bucket hasn't deleted, it's dangerous
> to always purge the bucket when deleting the user. 

Doesn't look like the bucket is created by OBC. There were no OBC present during the uninstall. 

bash-4.4$ radosgw-admin user list
[
    "noobaa-ceph-objectstore-user",
    "rook-ceph-internal-s3-user-checker-7dfb9ca5-1f97-4421-b5c6-d77f20c7fa05"
]
bash-4.4$ radosgw-admin bucket list
[
    "nb.1602567598484.origin-ci-int-aws.dev.rhcloud.com",
    "rook-ceph-bucket-checker-7dfb9ca5-1f97-4421-b5c6-d77f20c7fa05"
]

> 
> If the yes-really-destroy-data policy is set on the cluster CR, agreed that
> we can go ahead and purge the user.

Comment 12 Sébastien Han 2020-10-13 07:56:46 UTC
Thanks Santosh, the order should change as explained in my previous comment, the CephCluster CR must be deleted first, then all the other Ceph ressources.
This is not a Rook issue, moving to OCS operator.

Comment 13 Mudit Agarwal 2020-10-14 14:44:29 UTC
Providing dev ack as the fix for this should be same as https://bugzilla.redhat.com/show_bug.cgi?id=1886859

Fix is under test and we should have a PR soon.

Comment 20 Mudit Agarwal 2020-10-21 10:49:37 UTC
Backport PR is not yet merged.

Comment 21 Neha Berry 2020-10-28 16:55:18 UTC
Verified the fix on OCS 4.6 4.6.0-144.ci external mode cluster. Will test in internal mode too, before moving the BZ to verified state


1. Created an OCS external mode cluster. the cluster is in Connected state
2. Triggered OCS uninstall

Observation:

The storage cluster deletion is no longer stuck on cephobjectoreuser still exists.

OCP = 4.6.0-0.nightly-2020-10-22-034051
OCS = ocs-operator.v4.6.0-144.ci

_________________________________________________________________________________________________

Before triggering uninstall
=========================

Wed Oct 28 16:45:23 UTC 2020
--------------
========CSV ======
NAME                         DISPLAY                       VERSION        REPLACES   PHASE
ocs-operator.v4.6.0-144.ci   OpenShift Container Storage   4.6.0-144.ci              Succeeded
--------------
=======PODS ======
NAME                                            READY   STATUS    RESTARTS   AGE   IP             NODE        NOMINATED NODE   READINESS GATES
csi-cephfsplugin-hh85d                          3/3     Running   0          35m   10.1.160.165   compute-0   <none>           <none>
csi-cephfsplugin-n7rgp                          3/3     Running   0          35m   10.1.160.180   compute-2   <none>           <none>
csi-cephfsplugin-nvnmn                          3/3     Running   0          35m   10.1.160.161   compute-1   <none>           <none>
csi-cephfsplugin-provisioner-56455449bd-6cmhn   6/6     Running   0          35m   10.131.0.205   compute-1   <none>           <none>
csi-cephfsplugin-provisioner-56455449bd-bnnvk   6/6     Running   0          35m   10.129.2.94    compute-2   <none>           <none>
csi-rbdplugin-68wgt                             3/3     Running   0          35m   10.1.160.165   compute-0   <none>           <none>
csi-rbdplugin-6xfvz                             3/3     Running   0          35m   10.1.160.180   compute-2   <none>           <none>
csi-rbdplugin-7wjdv                             3/3     Running   0          35m   10.1.160.161   compute-1   <none>           <none>
csi-rbdplugin-provisioner-586fc6cfc-d55ds       6/6     Running   0          35m   10.128.2.68    compute-0   <none>           <none>
csi-rbdplugin-provisioner-586fc6cfc-nh2br       6/6     Running   0          35m   10.131.0.204   compute-1   <none>           <none>
noobaa-core-0                                   1/1     Running   0          35m   10.128.2.69    compute-0   <none>           <none>
noobaa-db-0                                     1/1     Running   0          35m   10.131.0.206   compute-1   <none>           <none>
noobaa-endpoint-58dc95697d-4gnzc                1/1     Running   0          34m   10.131.0.207   compute-1   <none>           <none>
noobaa-operator-7bcf846c94-h722m                1/1     Running   0          36m   10.131.0.203   compute-1   <none>           <none>
ocs-metrics-exporter-777dc7b97f-4v4hm           1/1     Running   0          36m   10.129.2.93    compute-2   <none>           <none>
ocs-operator-86846df567-gmp25                   1/1     Running   0          36m   10.129.2.91    compute-2   <none>           <none>
rook-ceph-operator-f44db9fbf-4bkrh              1/1     Running   0          36m   10.129.2.92    compute-2   <none>           <none>
--------------
======= PVC ==========
NAME             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                           AGE
db-noobaa-db-0   Bound    pvc-4c1a12e0-d866-4fe0-842d-95061698db86   50Gi       RWO            ocs-external-storagecluster-ceph-rbd   35m
--------------
======= storagecluster ==========
NAME                          AGE   PHASE   EXTERNAL   CREATED AT             VERSION
ocs-external-storagecluster   35m   Ready   true       2020-10-28T16:10:00Z   4.6.0

>> while true; do oc get cephobjectstore -n openshift-storage ; oc get cephobjectstoreuser; sleep 5; done


NAME                                          AGE
ocs-external-storagecluster-cephobjectstore   35m
NAME                           AGE
noobaa-ceph-objectstore-user   35m


2. deleted the storage cluster

$ date --utc; oc delete -n openshift-storage storagecluster --all --wait=true
Wed Oct 28 16:45:42 UTC 2020
storagecluster.ocs.openshift.io "ocs-external-storagecluster" deleted


>> rook-log snip


2020-10-28 16:46:01.516215 E | ceph-object-store-user-controller: failed to reconcile failed to delete ceph object user "noobaa-ceph-objectstore-user": failed to delete ceph object user "noobaa-ceph-objectstore-user". . could not remove user: unable to remove user, must specify purge data to remove user with buckets: failed to delete s3 user: exit status 17
2020-10-28 16:46:02.575081 I | ceph-spec: object "rook-ceph-config" matched on delete, reconciling
2020-10-28 16:46:02.575201 I | ceph-spec: removing finalizer "cephcluster.ceph.rook.io" on "ocs-external-storagecluster-cephcluster"
2020-10-28 16:46:02.591833 E | clusterdisruption-controller: cephcluster "openshift-storage/ocs-external-storagecluster-cephcluster" seems to be deleted, not requeuing until triggered again
2020-10-28 16:46:02.639919 I | ceph-spec: object "rook-ceph-mgr-external" matched on delete, reconciling
2020-10-28 16:46:02.711974 E | clusterdisruption-controller: cephcluster "openshift-storage/" seems to be deleted, not requeuing until triggered again
2020-10-28 16:46:02.712153 I | ceph-spec: removing finalizer "cephobjectstore.ceph.rook.io" on "ocs-external-storagecluster-cephobjectstore"
2020-10-28 16:46:02.739777 E | clusterdisruption-controller: cephcluster "openshift-storage/" seems to be deleted, not requeuing until triggered again
2020-10-28 16:46:02.755733 I | ceph-spec: object "rook-ceph-rgw-ocs-external-storagecluster-cephobjectstore" matched on delete, reconciling
2020-10-28 16:46:02.795772 E | ceph-object-store-user-controller: failed to reconcile failed to populate cluster info: not expected to create new cluster info and did not find existing secret
2020-10-28 16:46:03.796028 I | ceph-spec: removing finalizer "cephobjectstoreuser.ceph.rook.io" on "noobaa-ceph-objectstore-user"
2020-10-28 16:46:03.825505 I | ceph-spec: object "rook-ceph-object-user-ocs-external-storagecluster-cephobjectstore-noobaa-ceph-objectstore-user" matched on delete, reconciling



>> ocs-op snip


{"level":"info","ts":"2020-10-28T16:46:02.712Z","logger":"controller_storagecluster","msg":"Uninstall in progress","Request.Namespace":"openshift-storage","Request.Name":"ocs-external-storagecluster","Status":"Uninstall: Waiting for cephObjectStore ocs-external-storagecluster-cephobjectstore to be deleted"}
{"level":"info","ts":"2020-10-28T16:46:02.756Z","logger":"controller_storagecluster","msg":"Reconciling external StorageCluster","Request.Namespace":"openshift-storage","Request.Name":"ocs-external-storagecluster"}
{"level":"info","ts":"2020-10-28T16:46:02.798Z","logger":"controller_storagecluster","msg":"Uninstall: CephCluster not found, can't set the cleanup policy and uninstall mode","Request.Namespace":"openshift-storage","Request.Name":"ocs-external-storagecluster"}
{"level":"info","ts":"2020-10-28T16:46:02.798Z","logger":"controller_storagecluster","msg":"Uninstall: NooBaa not found, can't set UninstallModeForced","Request.Namespace":"openshift-storage","Request.Name":"ocs-external-storagecluster"}
{"level":"info","ts":"2020-10-28T16:46:02.798Z","logger":"controller_storagecluster","msg":"NooBaa and noobaa-core PVC not found.","Request.Namespace":"openshift-storage","Request.Name":"ocs-external-storagecluster"}
{"level":"info","ts":"2020-10-28T16:46:02.798Z","logger":"controller_storagecluster","msg":"Uninstall: CephCluster not found","Request.Namespace":"openshift-storage","Request.Name":"ocs-external-storagecluster"}
{"level":"info","ts":"2020-10-28T16:46:02.798Z","logger":"controller_storagecluster","msg":"Uninstall: CephObjectStoreUser not found","Request.Namespace":"openshift-storage","Request.Name":"ocs-external-storagecluster","CephObjectStoreUser Name":"ocs-external-storagecluster-cephobjectstoreuser"}
{"level":"info","ts":"2020-10-28T16:46:02.798Z","logger":"controller_storagecluster","msg":"Uninstall: CephObjectStore not found","Request.Namespace":"openshift-storage","Request.Name":"ocs-external-storagecluster","CephObjectStore Name":"ocs-external-storagecluster-cephobjectstore"}
{"level":"info","ts":"2020-10-28T16:46:02.898Z","logger":"controller_storagecluster","msg":"Uninstall: CephFilesystem not found","Request.Namespace":"openshift-storage","Request.Name":"ocs-external-storagecluster","CephFilesystem Name":"ocs-external-storagecluster-cephfilesystem"}
{"level":"info","ts":"2020-10-28T16:46:02.999Z","logger":"controller_storagecluster","msg":"Uninstall: CephBlockPool not found","Request.Namespace":"openshift-storage","Request.Name":"ocs-external-storagecluster","CephBlockPool Name":"ocs-external-storagecluster-cephblockpool"}


>>while true; do oc get cephobjectstore -n openshift-storage ; oc get cephobjectstoreuser; sleep 5; done
No resources found in openshift-storage namespace.
No resources found in openshift-storage namespace.

Comment 22 Neha Berry 2020-10-28 19:02:20 UTC
Created attachment 1724898 [details]
rook-logs

Verified the same on an internal mode cluster on Vmware, version = ocs-operator.v4.6.0-147.ci

Steps performed

1. Created 1 OBC, 2 PVCs, 2 Volumesnapshots
2. deleted storagecluster but it was stuck as there were OBCs/PVCs still existing

======= storagecluster ==========
NAME                 AGE   PHASE      EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   91m   Deleting              2020-10-28T17:01:52Z   4.6.0
--------------
======= cephcluster ==========
NAME                             DATADIRHOSTPATH   MONCOUNT   AGE   PHASE      MESSAGE                    HEALTH
ocs-storagecluster-cephcluster   /var/lib/rook     3          91m   Deleting   Failed to delete cluster   HEALTH_OK


3. deleted OBC and PVC and storagecluster deletion progressed. 
4. the cleanup pods were created and went to completed state

Wed Oct 28 18:38:55 UTC 2020
--------------
========CSV ======

--------------
======= storagecluster ==========
--------------
======= cephcluster ==========

$ date --utc ; time oc delete -n openshift-storage storagecluster --all --wait=true; date --utc
Wed Oct 28 18:20:14 UTC 2020
storagecluster.ocs.openshift.io "ocs-storagecluster" deleted

real	18m39.408s
user	0m0.506s
sys	0m0.125s
Wed Oct 28 18:38:54 UTC 2020
[nberry@localhost before]$ 


5. the cephobjectstoreuser (noobaa-ceph-objectstore-user) ultimately got deleted, after ~ 6 mins of storagecluster deletion


ceph obejctstore user is ultimately deleted

>>rook-op-log snip
2020-10-28 18:39:30.800540 I | ceph-cluster-controller: all ceph daemons are cleaned up
2020-10-28 18:39:30.800544 I | ceph-cluster-controller: starting clean up job on node "compute-0"
2020-10-28 18:39:30.838485 I | ceph-cluster-controller: starting clean up job on node "compute-2"
2020-10-28 18:39:30.857747 I | ceph-cluster-controller: starting clean up job on node "compute-1"
2020-10-28 18:44:08.026977 I | ceph-spec: removing finalizer "cephobjectstoreuser.ceph.rook.io" on "noobaa-ceph-objectstore-user"
2020-10-28 18:44:08.104127 I | ceph-spec: object "rook-ceph-object-user-ocs-storagecluster-cephobjectstore-noobaa-ceph-objectstore-user" matched on delete, reconciling


>>ocs-op logs

{"level":"info","ts":"2020-10-28T18:38:54.178Z","logger":"controller_storagecluster","msg":"Uninstall: CephObjectStore not found","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","CephObjectStore Name":"ocs-storagecluster-cephobjectstore"}
{"level":"info","ts":"2020-10-28T18:38:54.178Z","logger":"controller_storagecluster","msg":"Uninstall: CephFilesystem not found","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","CephFilesystem Name":"ocs-storagecluster-cephfilesystem"}
{"level":"info","ts":"2020-10-28T18:38:54.178Z","logger":"controller_storagecluster","msg":"Uninstall: CephBlockPool not found","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","CephBlockPool Name":"ocs-storagecluster-cephblockpool"}
{"level":"info","ts":"2020-10-28T18:38:54.178Z","logger":"controller_storagecluster","msg":"Uninstall: Deleting SnapshotClass ocs-storagecluster-rbdplugin-snapclass","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":"2020-10-28T18:38:54.188Z","logger":"controller_storagecluster","msg":"Uninstall: Deleting SnapshotClass ocs-storagecluster-cephfsplugin-snapclass","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":"2020-10-28T18:38:54.194Z","logger":"controller_storagecluster","msg":"Uninstall: Deleting StorageClass ocs-storagecluster-cephfs","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":"2020-10-28T18:38:54.200Z","logger":"controller_storagecluster","msg":"Uninstall: Deleting StorageClass ocs-storagecluster-ceph-rbd","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":"2020-10-28T18:38:54.207Z","logger":"controller_storagecluster","msg":"Uninstall: Deleting StorageClass ocs-storagecluster-ceph-rgw","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":"2020-10-28T18:38:54.216Z","logger":"controller_storagecluster","msg":"Uninstall: Deleting OCS NodeTolerationKey from the node compute-2","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":"2020-10-28T18:38:54.227Z","logger":"controller_storagecluster","msg":"Uninstall: Deleting OCS NodeTolerationKey from the node compute-0","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":"2020-10-28T18:38:54.236Z","logger":"controller_storagecluster","msg":"Uninstall: Deleting OCS NodeTolerationKey from the node compute-1","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":"2020-10-28T18:38:54.245Z","logger":"controller_storagecluster","msg":"Removing finalizer","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":"2020-10-28T18:38:54.262Z","logger":"controller_storagecluster","msg":"Object is terminated, skipping reconciliation","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":"2020-10-28T18:38:54.274Z","logger":"controller_storagecluster","msg":"No StorageCluster resource","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}



6. deleted the namespace and deletion succeeded

$ oc delete namespace openshift-storage
namespace "openshift-storage" deleted
[nberry@localhost oct28-147.ci]$ oc get project openshift-storage -o yaml ; date --utc
Error from server (NotFound): namespaces "openshift-storage" not found


Hence, moving the BZ to verified state.

___________________________________________________________
pods after deletion of storagecluster
=====================================

Wed Oct 28 18:53:16 UTC 2020
--------------
========CSV ======
NAME                         DISPLAY                       VERSION        REPLACES   PHASE
ocs-operator.v4.6.0-147.ci   OpenShift Container Storage   4.6.0-147.ci              Succeeded
--------------
=======PODS ======
NAME                                           READY   STATUS      RESTARTS   AGE     IP             NODE        NOMINATED NODE   READINESS GATES
cluster-cleanup-job-compute-0-vrgw5            0/1     Completed   0          13m     10.128.2.120   compute-0   <none>           <none>
cluster-cleanup-job-compute-1-rpd25            0/1     Completed   0          13m     10.131.0.222   compute-1   <none>           <none>
cluster-cleanup-job-compute-2-vc6q5            0/1     Completed   0          13m     10.129.2.121   compute-2   <none>           <none>
compute-0-debug                                1/1     Running     0          3m15s   10.1.160.165   compute-0   <none>           <none>
csi-cephfsplugin-hzpqm                         3/3     Running     0          111m    10.1.160.161   compute-1   <none>           <none>
csi-cephfsplugin-provisioner-98d99f679-c7kvm   6/6     Running     0          111m    10.131.0.213   compute-1   <none>           <none>
csi-cephfsplugin-provisioner-98d99f679-krjjm   6/6     Running     0          111m    10.129.2.109   compute-2   <none>           <none>
csi-cephfsplugin-tlpm5                         3/3     Running     0          111m    10.1.160.180   compute-2   <none>           <none>
csi-cephfsplugin-z8lt4                         3/3     Running     0          111m    10.1.160.165   compute-0   <none>           <none>
csi-rbdplugin-bmwl4                            3/3     Running     0          111m    10.1.160.180   compute-2   <none>           <none>
csi-rbdplugin-cmb8l                            3/3     Running     0          111m    10.1.160.165   compute-0   <none>           <none>
csi-rbdplugin-m7gv6                            3/3     Running     0          111m    10.1.160.161   compute-1   <none>           <none>
csi-rbdplugin-provisioner-7d5fc5cf64-f65tb     6/6     Running     0          111m    10.128.2.74    compute-0   <none>           <none>
csi-rbdplugin-provisioner-7d5fc5cf64-jh7kt     6/6     Running     0          111m    10.131.0.212   compute-1   <none>           <none>
noobaa-operator-549d7c6f56-vvlwj               1/1     Running     0          112m    10.129.2.107   compute-2   <none>           <none>
ocs-metrics-exporter-674fccb975-pkdld          1/1     Running     0          112m    10.129.2.108   compute-2   <none>           <none>
ocs-operator-67d7b745bd-h5k2n                  1/1     Running     0          112m    10.129.2.106   compute-2   <none>           <none>
rook-ceph-operator-6994879bbf-n9qvf            1/1     Running     0          112m    10.131.0.210   compute-1   <none>           <none>
--------------
======= PVC ==========

NAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
ocs-deviceset-thin-0-data-0-njqt8   Bound    pvc-534a030a-3b71-4c72-8b3c-f20f1dcc117a   512Gi      RWO            thin           108m
ocs-deviceset-thin-1-data-0-g7v7n   Bound    pvc-c8d1aa8d-bb6c-45a3-99e0-e1e078c31e72   512Gi      RWO            thin           108m
ocs-deviceset-thin-2-data-0-p6wkq   Bound    pvc-315fcb81-ffbd-4398-a67e-66b7e43f6251   512Gi      RWO            thin           108m
--------------
======= storagecluster ==========

No resources found in openshift-storage namespace.
--------------
======= cephcluster ==========
No resources found in openshift-storage namespace.

Comment 23 Neha Berry 2020-10-28 19:03:10 UTC
Moving the BZ to verified state based on outputs and observations in comment#21 and comment#22

Comment 26 errata-xmlrpc 2020-12-17 06:24:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.6.0 security, bug fix, enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5605


Note You need to log in before you can comment on or make changes to this bug.