Bug 1983756 - [Multus] Deletion of CephBlockPool get stuck and blocks creation of new pools
Summary: [Multus] Deletion of CephBlockPool get stuck and blocks creation of new pools
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: rook
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ODF 4.9.0
Assignee: Sébastien Han
QA Contact: Sidhant Agrawal
URL:
Whiteboard:
: 1982672 (view as bug list)
Depends On:
Blocks: 1966894 1993796 2011326
TreeView+ depends on / blocked
 
Reported: 2021-07-19 17:00 UTC by Sidhant Agrawal
Modified: 2023-08-09 17:03 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
.Deletion of `CephBlockPool` gets stuck and blocks the creation of new pools Previously, in a Multus enabled cluster, the Rook Operator did not have access to the object storage daemon (OSD) network as it did not have the network annotations. As a result, the `rbd` type commands during a pool cleanup would hang because the OSDs could not be contacted. With this release, the operator proxies the `rbd` command through a sidecar container in the `mgr` pod and runs successfully during the pool cleanup.
Clone Of:
: 1993796 (view as bug list)
Environment:
Last Closed: 2021-12-13 17:44:54 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage rook pull 19 0 None None None 2021-09-09 10:12:25 UTC
Github red-hat-storage rook pull 28 0 None open Bug 1983756: ceph: do not build all the args to remote exec cmd 2021-09-29 08:23:13 UTC
Github red-hat-storage rook pull 305 0 None open Bug 1983756: ceph: only merge stderr on error 2021-10-18 16:29:36 UTC
Github rook rook pull 8339 0 None open ceph: proxy rbd commands when multus is enabled 2021-07-20 09:52:00 UTC
Github rook rook pull 8659 0 None None None 2021-09-08 14:13:30 UTC
Github rook rook pull 8860 0 None open multus: do not build all the args to exec commands 2021-09-28 16:58:40 UTC
Github rook rook pull 8995 0 None open ceph: only merge stderr on error 2021-10-18 14:35:41 UTC
Red Hat Product Errata RHSA-2021:5086 0 None None None 2021-12-13 17:45:16 UTC

Description Sidhant Agrawal 2021-07-19 17:00:57 UTC
Description of problem (please be detailed as possible and provide log
snippests):
In Multus enabled cluster, deletion of CephBlockPool is getting stuck and pool doesn't get deleted from Ceph side.
This also blocks creation of new CephBlockPools as they never enter Ready Phase and new pool is not created in Ceph side.

Version of all relevant components (if applicable):
OCS: ocs-operator.v4.8.0-455.ci

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, I'm unable to create delete old CephBlockPool and create new one

Is there any workaround available to the best of your knowledge?
Delete pool manually from Ceph side using toolbox

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:


Steps to Reproduce:

Manual steps:
1. Install OCS with Multus enabled
2. Create a new blockpool 
3. Delete the blockpool created in step 2 (will not succeed)
4. Try to create another blockpool (will not succeed)

Or 

Run tests from tests/manage/storageclass [1]
Let say, first run tests/manage/storageclass/test_create_2_sc_with_1_pool_comp_rep2.py::TestMultipleScOnePoolRep2Comp::test_multiple_sc_one_pool_rep2_comp
> test will pass but there will be error in teardown due to failure in cephblockpool deletion

Now run tests/manage/storageclass/test_create_2sc_at_once_with_io.py::TestCreate2ScAtOnceWithIo::test_new_sc_rep2_rep3_at_once 
>This test will fail during creation of new cephblockpool

[1] https://github.com/red-hat-storage/ocs-ci/tree/master/tests/manage/storageclass

Actual results:
Existing pool deletion get stuck and unable to create new one

Expected results:
Should be able to delete an existing pool and create new pools

Additional info:

Comment 3 Sébastien Han 2021-07-20 14:24:47 UTC
Downstream PR is ready but on-hold, waiting for various acks and a blocker flag.

Comment 5 Mudit Agarwal 2021-07-21 14:52:51 UTC
Moving it out of 4.8 as discussed in the thread mentioned in the above comment.

Sebastien, please fill the doc text.

Comment 7 Mudit Agarwal 2021-07-28 05:38:25 UTC
.Deletion of `CephBlockPool` gets stuck and blocks the creation of new pools When Rook is deployed along with Multus, the Rook Operator does not have the network annotations and thus does not have access to the OSD network. This means that when running "rbd" type commands during pool cleanup, the command hangs since it cannot contact the OSDs. The workaround is to delete the CephBlockPool manually using toolbox.

Comment 15 Travis Nielsen 2021-08-16 15:54:45 UTC
*** Bug 1982672 has been marked as a duplicate of this bug. ***

Comment 18 Sébastien Han 2021-09-08 13:21:38 UTC
Sidhant, please provide debug logs or access to the env, thanks.

Comment 19 Sidhant Agrawal 2021-09-08 13:49:22 UTC
Shared cluster details via gchat. Clearing NI.

Comment 20 Sébastien Han 2021-09-09 10:12:25 UTC
Resync PR https://github.com/red-hat-storage/rook/pull/19

Comment 21 Mudit Agarwal 2021-09-22 09:21:02 UTC
Fix should be available in the latest ODF builds

Comment 27 Sébastien Han 2021-09-28 16:58:41 UTC
I don't need the env anymore, thanks for providing it.

Comment 30 Sébastien Han 2021-10-18 16:29:59 UTC
https://github.com/red-hat-storage/rook/pull/305

Comment 33 Mudit Agarwal 2021-11-03 04:15:26 UTC
Doc text needs to be changed, earlier it was a known issue and now a bug fix.

Comment 37 errata-xmlrpc 2021-12-13 17:44:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Data Foundation 4.9.0 enhancement, security, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:5086


Note You need to log in before you can comment on or make changes to this bug.