Bug 1941918
Summary: | [Doc] [Arbiter] Disable mon failover in stretch mode | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat OpenShift Container Storage | Reporter: | Mudit Agarwal <muagarwa> |
Component: | documentation | Assignee: | Olive Lakra <olakra> |
Status: | CLOSED WONTFIX | QA Contact: | Elad <ebenahar> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.7 | CC: | agantony, bniver, ebenahar, etamir, gfarnum, madam, mbukatov, muagarwa, nberry, ocs-bugs, olakra, owasserm, prsurve, rcyriac, shan, sostapov, tnielsen |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | 1939617 | Environment: | |
Last Closed: | 2022-03-08 08:31:28 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1939007, 1939617, 1939766 | ||
Bug Blocks: |
Comment 4
Mudit Agarwal
2021-03-24 13:37:31 UTC
There isn't much to document, other than they need to bring the node back up where the failed mon was running. Since the feature is in tech preview, do we really need to document this? Perhaps it's more appropriate for release notes. Its either release notes or adding an important note or warning in the Tech preview section of the relevant guide. Whatever is preferable. I see that we are considering enabling mon fail over after all: https://bugzilla.redhat.com/show_bug.cgi?id=1939617#c17 What to document and how depends on decision about this, and if we decide that mon fail over will be still disabled, how exactly should it work, as we had some unexpected behaviour related to this (see https://bugzilla.redhat.com/show_bug.cgi?id=1939617#c15). I proposed here that we leave mon failover disabled until 4.8. https://bugzilla.redhat.com/show_bug.cgi?id=1939617#c19 If that holds, we can simply document that if a node goes down with a mon, it needs to be brought back up to avoid the risk of losing more mons. No, we don't need to port it to 4.8 as we are re-enabling the mon failover in 4.8
It should be
>> If a node with a failed mon goes down, it is important to fix the node to avoid the risk of losing more mons.
I might phrase it this way:
>> If a node with a failed mon goes down, it is important to bring the node back online to restore the mon. If three mons are permanently down, the cluster stops working.
Suggestion from Mudit looks good to me. Suggestion from Travis looks good as well, that said if I understand it right, we can't avoid loosing more than one mon, since with 2 mons down, cluster loses quorum and basically stops working. (In reply to Martin Bukatovic from comment #13) > Suggestion from Mudit looks good to me. Suggestion from Travis looks good as > well, that said if I understand it right, we can't avoid loosing more than > one mon, since with 2 mons down, cluster loses quorum and basically stops > working. Retracting my claim above. In arbiter mode, we have 5 mons ... Suggestion from Travis looks like a best option. |