Bug 2228555 - blockpool stuck on deleting when referred to storage class [NEEDINFO]
Summary: blockpool stuck on deleting when referred to storage class
Keywords:
Status: NEW
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: rook
Version: 4.13
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Travis Nielsen
QA Contact: Neha Berry
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-08-02 16:07 UTC by Daniel Osypenko
Modified: 2023-08-15 16:03 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:
dosypenk: needinfo? (badhikar)


Attachments (Terms of Use)

Description Daniel Osypenko 2023-08-02 16:07:55 UTC
Description of problem (please be detailed as possible and provide log
snippests):

When deleting blockpool referred to storage class it shown that blockpool deleted, the command gets stuck in progress infinity; deleting the blockpool again shows that blockpool does not exist;
getting the blockpool status from ceph or from oc cmd it shows that blockpool is ready

danielosypenko@dosypenk-mac ocs-ci % oc get cephblockpools -n openshift-storage
NAME                                       PHASE
cbp-test-04e89e5772d64e5c97ac468df6f7a3a   Ready
ocs-storagecluster-cephblockpool           Ready
danielosypenko@dosypenk-mac ocs-ci % oc delete cephblockpool 'cbp-test-04e89e5772d64e5c97ac468df6f7a3a' -n openshift-storage
cephblockpool.ceph.rook.io "cbp-test-04e89e5772d64e5c97ac468df6f7a3a" deleted
^C%  <-------- ABORTED AFTER 10 MIN                                                                                                                                                                   danielosypenko@dosypenk-mac ocs-ci % oc get cephblockpools -n openshift-storage
NAME                                       PHASE
cbp-test-04e89e5772d64e5c97ac468df6f7a3a   Ready
ocs-storagecluster-cephblockpool           Ready

danielosypenko@dosypenk-mac ocs-ci % oc get cephblockpool cbp-test-3ccc40827cb34d7596050cb24cc4199                           
Error from server (NotFound): cephblockpools.ceph.rook.io "cbp-test-3ccc40827cb34d7596050cb24cc4199" not found  <-------- BLOCKPOOL not found, but ready
danielosypenko@dosypenk-mac ocs-ci % oc get cephblockpools -n openshift-storage                                              
NAME                                       PHASE
cbp-test-04e89e5772d64e5c97ac468df6f7a3a   Ready
ocs-storagecluster-cephblockpool           Ready


Version of all relevant components (if applicable):

OC version:
Client Version: 4.13.4
Kustomize Version: v4.5.7
Server Version: 4.13.0-0.nightly-2023-07-27-134342
Kubernetes Version: v1.26.6+73ac561

OCS verison:
ocs-operator.v4.13.1-rhodf              OpenShift Container Storage   4.13.1-rhodf   ocs-operator.v4.13.0-rhodf              Succeeded

Cluster version
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.0-0.nightly-2023-07-27-134342   True        False         27h     Cluster version is 4.13.0-0.nightly-2023-07-27-134342

Rook version:
rook: v4.13.1-0.b57f0c7db8116e754fc77b55825d7fd75c6f1aa3
go: go1.19.10

Ceph version:
ceph version 17.2.6-70.el9cp (fe62dcdbb2c6e05782a3e2b67d025b84ff5047cc) quincy (stable)


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
script may stuck, minor impact

Is there any workaround available to the best of your knowledge?
abrupt script and delete storageclass if needed

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
yes

Can this issue reproduce from the UI?
yes

If this is a regression, please provide more details to justify this:
-

Steps to Reproduce:
1. deploy ODF cluster 
2. deploy storage class of interface CephBlockPool with new rbdpool
3. delete cephblockpool 


Actual results:

`oc delete cephblockpool` cmd stuck in progress, cephblockpool stay ready, next delete cmd falsely signals: cephblockpool.ceph.rook.io <pool-name> deleted 

Expected results:

delete should be rejected due and output provided similarly to ceph error message

"Error EPERM: WARNING: this will *PERMANENTLY DESTROY* all data stored in pool cbp-test-04e89e5772d64e5c97ac468df6f7a3a.  If you are *ABSOLUTELY CERTAIN* that is what you want, pass the pool name *twice*, followed by --yes-i-really-really-mean-it."

Additional info:
reproduced on OCP 4.10, ODF 4.10

Comment 6 Daniel Osypenko 2023-08-03 08:36:22 UTC
current behavior also producing inconsistency between UI and cli interfaces

After this rejected deletion User can enter the pool page and see it's Status, metrics, etc similarly to any other pool but clicking on Actions dropdown btn user will see that Resource is being deleted.
Edit label, Edit annotations, Edit block pool actions from now are disabled.

From the cli user may continue patch cephblockpool:

oc patch cephblockpool cbp-test-04e89e5772d64e5c97ac468df6f7a3a -n openshift-storage --type=merge -p '{"metadata":{"labels":{"new-label":"label-value"}}}'  
cephblockpool.ceph.rook.io/cbp-test-04e89e5772d64e5c97ac468df6f7a3a patched


What should be an actions of the user that accidentally tried to remove the pool and want to continue work with it, edit it via UI?

Comment 7 Travis Nielsen 2023-08-03 18:28:03 UTC
> current behavior also producing inconsistency between UI and cli interfaces

The inconsistency is that the cli user can update the pool, but the UI cannot? It seems valid that the UI cannot update the deleted resource, but is a reasonable advanced scenario for a cli user to update the resource.


> What should be an actions of the user that accidentally tried to remove the pool and want to continue work with it, edit it via UI?

K8s doesn't allow undeleting a resource, so the only way to bring the resource back to a non-deleted state is to delete it and create it again with the same spec.
There are instructions upstream for this: https://rook.io/docs/rook/latest/Troubleshooting/disaster-recovery/#restoring-crds-after-deletion
That if we have customer issues with it downstream, we should get something similar in the downstream docs.

Comment 8 Travis Nielsen 2023-08-15 15:24:24 UTC
Shall we close this? Sounds like expected behavior from ODF perspective. Were you able to recover?

Comment 9 Daniel Osypenko 2023-08-15 15:36:01 UTC
Hello Travis. Can not find documentation for OCP similar to https://rook.io/docs/rook/latest/Troubleshooting/disaster-recovery/#restoring-crds-after-deletion 
I also tried to restore blockpool by steps provided in this instruction and fail on step 6. `oc create -f cluster.yaml` I am not sure if it is applicable for OCP cluster.  
Do you think we can add such instruction to our documents?

Comment 10 Travis Nielsen 2023-08-15 16:03:57 UTC
Restoring a deleted CR is really a risky operation, so at best it should be a KCS article and the support team should be involved in the restoration.

You said the CephBlockPool was deleted, right? Then I wouldn't expect the cluster.yaml to be created again. The restore needs to only be for the specific CephBlockPool. That upstream document is intended to be an example that needs to be adjusted depending on which CR is being restored.


Note You need to log in before you can comment on or make changes to this bug.