Back to bug 2120598

Who When What Removed Added
Sunil Kumar Acharya 2022-08-23 11:25:41 UTC Depends On 2120601
Sunil Kumar Acharya 2022-08-23 11:26:41 UTC Summary [GSS] ceph cluster unresponsive when 2 nodes of same zone is down in stretch cluster [GSS] [4.11.z-Clone] ceph cluster unresponsive when 2 nodes of same zone is down in stretch cluster
Sunil Kumar Acharya 2022-08-23 11:28:52 UTC Link ID Github rook/rook/pull/10717
Sunil Kumar Acharya 2022-08-23 11:29:33 UTC Status NEW POST
Eran Tamir 2022-08-23 11:58:41 UTC CC etamir
OpenShift BugZilla Robot 2022-08-25 18:53:59 UTC Link ID Github red-hat-storage/rook/pull/405
Prasad Desala 2022-08-26 05:48:27 UTC QA Contact nberry mashetty
krishnaram Karthick 2022-08-26 11:53:54 UTC CC kramdoss
RHEL Program Management 2022-08-26 11:54:02 UTC Target Release --- ODF 4.11.1
OpenShift BugZilla Robot 2022-08-26 11:57:54 UTC Status POST MODIFIED
Sunil Kumar Acharya 2022-08-26 11:59:31 UTC Flags needinfo?(tnielsen)
Sunil Kumar Acharya 2022-08-26 12:01:19 UTC Flags needinfo?(tnielsen)
errata-xmlrpc 2022-08-30 05:40:28 UTC Status MODIFIED ON_QA
Sunil Kumar Acharya 2022-09-06 08:36:59 UTC Flags needinfo?(tnielsen)
Travis Nielsen 2022-09-06 19:38:55 UTC Doc Text Cause: If the operator is restarted in the middle of a mon failover, multiple mons may be started on the same node, which reduces the mon quorum availability.

Consequence: Two mons could end up on the same node instead of spreading the mons across unique nodes.

Fix: The operator will 1) properly cancel mon failover if the mon failover times out, and 2) ensure that any extra mons are removed based on stretch topology or multiple mons running on the same node.

Result: Mon quorum will maintain proper spread across nodes and stretch topology.
Doc Type If docs needed, set a value Bug Fix
Mahesh Shetty 2022-09-09 11:15:19 UTC Status ON_QA VERIFIED
Travis Nielsen 2022-09-12 17:59:48 UTC Flags needinfo?(tnielsen)
Agil Antony 2022-09-14 06:51:48 UTC CC agantony
Flags needinfo?(tnielsen)
Agil Antony 2022-09-14 10:22:44 UTC Doc Text Cause: If the operator is restarted in the middle of a mon failover, multiple mons may be started on the same node, which reduces the mon quorum availability.

Consequence: Two mons could end up on the same node instead of spreading the mons across unique nodes.

Fix: The operator will 1) properly cancel mon failover if the mon failover times out, and 2) ensure that any extra mons are removed based on stretch topology or multiple mons running on the same node.

Result: Mon quorum will maintain proper spread across nodes and stretch topology.
Previously, two MONs could end up on the same node instead of being spread across unique nodes. This happened when the operator was restarted in the middle of a MON failover, multiple MONs could be started on the same node, which reduced the MON quorum availability.
With this update, the operator properly cancels the MON failover if the MON failover times out, and ensures that any extra MONs are removed based on the stretch topology or multiple MONs that run on the same node.
errata-xmlrpc 2022-09-14 11:46:54 UTC Status VERIFIED RELEASE_PENDING
Travis Nielsen 2022-09-14 13:23:48 UTC Flags needinfo?(tnielsen)
errata-xmlrpc 2022-09-14 15:15:05 UTC Status RELEASE_PENDING CLOSED
Resolution --- ERRATA
Last Closed 2022-09-14 15:15:05 UTC
errata-xmlrpc 2022-09-14 15:15:25 UTC Link ID Red Hat Product Errata RHBA-2022:6525
Elad 2023-08-09 17:03:01 UTC CC odf-bz-bot

Back to bug 2120598