Bug 2133683

Summary: After shutting down 2 worker nodes on the MS provider cluster 2 mons are down and ceph health is not recovered
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Dhruv Bindra <dbindra>
Component: rookAssignee: Travis Nielsen <tnielsen>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Neha Berry <nberry>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.10CC: aeyal, fbalak, ikave, madam, mmuench, nberry, ocs-bugs, odf-bz-bot, omitrani, owasserm, tnielsen
Target Milestone: ---Flags: dbindra: needinfo? (ikave)
dbindra: needinfo? (ikave)
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2112021 Environment:
Last Closed: 2022-11-07 16:19:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2112021    

Comment 2 Travis Nielsen 2022-10-11 17:58:38 UTC
There is no must-gather, so please provide more details from the cluster:
1. Why are the mons in pending state? "oc describe pod" should show the reason. I suspect they have node affinity to the nodes that were just deleted.
2. Are you using host networking? If so, the mons will always be tied to their node, and you won't be able to take two mons down at the same time without bringing at least one of them back up. That's just not supported.
3. As long as two mons are down, everything else in the cluster will be down, including everything in the rook operator timing out when it tries to run ceph commands.

Comment 3 Travis Nielsen 2022-10-24 15:17:13 UTC
Is this still an issue or shall we close this?

Comment 4 Dhruv Bindra 2022-10-28 05:41:13 UTC
Tagging @ikave to get more info about this bug as he is the QE assignee

Comment 6 Travis Nielsen 2022-11-03 19:19:00 UTC
Moving out of 4.12 while waiting for more details

Comment 7 Travis Nielsen 2022-11-07 16:19:50 UTC
Please reopen if there are more details to investigate