Description of problem (please be detailed as possible and provide log snippests): When deleting blockpool referred to storage class it shown that blockpool deleted, the command gets stuck in progress infinity; deleting the blockpool again shows that blockpool does not exist; getting the blockpool status from ceph or from oc cmd it shows that blockpool is ready danielosypenko@dosypenk-mac ocs-ci % oc get cephblockpools -n openshift-storage NAME PHASE cbp-test-04e89e5772d64e5c97ac468df6f7a3a Ready ocs-storagecluster-cephblockpool Ready danielosypenko@dosypenk-mac ocs-ci % oc delete cephblockpool 'cbp-test-04e89e5772d64e5c97ac468df6f7a3a' -n openshift-storage cephblockpool.ceph.rook.io "cbp-test-04e89e5772d64e5c97ac468df6f7a3a" deleted ^C% <-------- ABORTED AFTER 10 MIN danielosypenko@dosypenk-mac ocs-ci % oc get cephblockpools -n openshift-storage NAME PHASE cbp-test-04e89e5772d64e5c97ac468df6f7a3a Ready ocs-storagecluster-cephblockpool Ready danielosypenko@dosypenk-mac ocs-ci % oc get cephblockpool cbp-test-3ccc40827cb34d7596050cb24cc4199 Error from server (NotFound): cephblockpools.ceph.rook.io "cbp-test-3ccc40827cb34d7596050cb24cc4199" not found <-------- BLOCKPOOL not found, but ready danielosypenko@dosypenk-mac ocs-ci % oc get cephblockpools -n openshift-storage NAME PHASE cbp-test-04e89e5772d64e5c97ac468df6f7a3a Ready ocs-storagecluster-cephblockpool Ready Version of all relevant components (if applicable): OC version: Client Version: 4.13.4 Kustomize Version: v4.5.7 Server Version: 4.13.0-0.nightly-2023-07-27-134342 Kubernetes Version: v1.26.6+73ac561 OCS verison: ocs-operator.v4.13.1-rhodf OpenShift Container Storage 4.13.1-rhodf ocs-operator.v4.13.0-rhodf Succeeded Cluster version NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.13.0-0.nightly-2023-07-27-134342 True False 27h Cluster version is 4.13.0-0.nightly-2023-07-27-134342 Rook version: rook: v4.13.1-0.b57f0c7db8116e754fc77b55825d7fd75c6f1aa3 go: go1.19.10 Ceph version: ceph version 17.2.6-70.el9cp (fe62dcdbb2c6e05782a3e2b67d025b84ff5047cc) quincy (stable) Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? script may stuck, minor impact Is there any workaround available to the best of your knowledge? abrupt script and delete storageclass if needed Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? yes Can this issue reproduce from the UI? yes If this is a regression, please provide more details to justify this: - Steps to Reproduce: 1. deploy ODF cluster 2. deploy storage class of interface CephBlockPool with new rbdpool 3. delete cephblockpool Actual results: `oc delete cephblockpool` cmd stuck in progress, cephblockpool stay ready, next delete cmd falsely signals: cephblockpool.ceph.rook.io <pool-name> deleted Expected results: delete should be rejected due and output provided similarly to ceph error message "Error EPERM: WARNING: this will *PERMANENTLY DESTROY* all data stored in pool cbp-test-04e89e5772d64e5c97ac468df6f7a3a. If you are *ABSOLUTELY CERTAIN* that is what you want, pass the pool name *twice*, followed by --yes-i-really-really-mean-it." Additional info: reproduced on OCP 4.10, ODF 4.10
current behavior also producing inconsistency between UI and cli interfaces After this rejected deletion User can enter the pool page and see it's Status, metrics, etc similarly to any other pool but clicking on Actions dropdown btn user will see that Resource is being deleted. Edit label, Edit annotations, Edit block pool actions from now are disabled. From the cli user may continue patch cephblockpool: oc patch cephblockpool cbp-test-04e89e5772d64e5c97ac468df6f7a3a -n openshift-storage --type=merge -p '{"metadata":{"labels":{"new-label":"label-value"}}}' cephblockpool.ceph.rook.io/cbp-test-04e89e5772d64e5c97ac468df6f7a3a patched What should be an actions of the user that accidentally tried to remove the pool and want to continue work with it, edit it via UI?
> current behavior also producing inconsistency between UI and cli interfaces The inconsistency is that the cli user can update the pool, but the UI cannot? It seems valid that the UI cannot update the deleted resource, but is a reasonable advanced scenario for a cli user to update the resource. > What should be an actions of the user that accidentally tried to remove the pool and want to continue work with it, edit it via UI? K8s doesn't allow undeleting a resource, so the only way to bring the resource back to a non-deleted state is to delete it and create it again with the same spec. There are instructions upstream for this: https://rook.io/docs/rook/latest/Troubleshooting/disaster-recovery/#restoring-crds-after-deletion That if we have customer issues with it downstream, we should get something similar in the downstream docs.
Shall we close this? Sounds like expected behavior from ODF perspective. Were you able to recover?
Hello Travis. Can not find documentation for OCP similar to https://rook.io/docs/rook/latest/Troubleshooting/disaster-recovery/#restoring-crds-after-deletion I also tried to restore blockpool by steps provided in this instruction and fail on step 6. `oc create -f cluster.yaml` I am not sure if it is applicable for OCP cluster. Do you think we can add such instruction to our documents?
Restoring a deleted CR is really a risky operation, so at best it should be a KCS article and the support team should be involved in the restoration. You said the CephBlockPool was deleted, right? Then I wouldn't expect the cluster.yaml to be created again. The restore needs to only be for the specific CephBlockPool. That upstream document is intended to be an example that needs to be adjusted depending on which CR is being restored.