Bug 1955831 - [GSS][mon] rook-operator scales mons to 4 after healthCheck timeout
Summary: [GSS][mon] rook-operator scales mons to 4 after healthCheck timeout
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat
Component: rook
Version: 4.6
Hardware: All
OS: All
Target Milestone: ---
: ---
Assignee: Travis Nielsen
QA Contact: Shrivaibavi Raghaventhiran
Depends On:
TreeView+ depends on / blocked
Reported: 2021-04-30 22:43 UTC by Randy Martinez
Modified: 2021-05-12 18:33 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed:
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github rook rook issues 7797 0 None open Mon failover can cause mons to fall out of quorum if the operator is disrupted in the middle of the failover 2021-04-30 22:45:24 UTC
Github rook rook pull 7884 0 None open ceph: Persist expected mon endpoints immediately during mon failover 2021-05-11 21:18:54 UTC

Comment 3 Shrivaibavi Raghaventhiran 2021-05-03 07:09:00 UTC
OCS QE team is following up this BZ. GSS and development team can contact us if any help or information is needed from our end

Comment 5 Travis Nielsen 2021-05-10 17:12:00 UTC
Acking for 4.8 and we also should clone to 4.7.z after confirming the fix. The side effect of this issue to lose quorum is too severe and the workaround of reseting the mon quorum to a single mon for recovery is too involved.

Comment 6 Travis Nielsen 2021-05-11 21:18:54 UTC
The fix is low risk, we should backport to 4.7.z and 4.6.z

Comment 8 Travis Nielsen 2021-05-12 18:33:47 UTC
This is merged downstream to 4.8 with https://github.com/openshift/rook/pull/235.
I'll clone for 4.7.z and 4.6.z.

Note You need to log in before you can comment on or make changes to this bug.