Bug 2133683 - After shutting down 2 worker nodes on the MS provider cluster 2 mons are down and ceph health is not recovered [NEEDINFO]
Summary: After shutting down 2 worker nodes on the MS provider cluster 2 mons are down...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: rook
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Travis Nielsen
QA Contact: Neha Berry
URL:
Whiteboard:
Depends On:
Blocks: 2112021
TreeView+ depends on / blocked
 
Reported: 2022-10-11 07:35 UTC by Dhruv Bindra
Modified: 2023-08-09 17:03 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2112021
Environment:
Last Closed: 2022-11-07 16:19:50 UTC
Embargoed:
dbindra: needinfo? (ikave)
dbindra: needinfo? (ikave)


Attachments (Terms of Use)

Comment 2 Travis Nielsen 2022-10-11 17:58:38 UTC
There is no must-gather, so please provide more details from the cluster:
1. Why are the mons in pending state? "oc describe pod" should show the reason. I suspect they have node affinity to the nodes that were just deleted.
2. Are you using host networking? If so, the mons will always be tied to their node, and you won't be able to take two mons down at the same time without bringing at least one of them back up. That's just not supported.
3. As long as two mons are down, everything else in the cluster will be down, including everything in the rook operator timing out when it tries to run ceph commands.

Comment 3 Travis Nielsen 2022-10-24 15:17:13 UTC
Is this still an issue or shall we close this?

Comment 4 Dhruv Bindra 2022-10-28 05:41:13 UTC
Tagging @ikave to get more info about this bug as he is the QE assignee

Comment 6 Travis Nielsen 2022-11-03 19:19:00 UTC
Moving out of 4.12 while waiting for more details

Comment 7 Travis Nielsen 2022-11-07 16:19:50 UTC
Please reopen if there are more details to investigate


Note You need to log in before you can comment on or make changes to this bug.