Back to bug 2120601

Who When What Removed Added
Sunil Kumar Acharya 2022-08-23 11:26:57 UTC Summary [GSS] ceph cluster unresponsive when 2 nodes of same zone is down in stretch cluster [GSS] [4.10.z-Clone] ceph cluster unresponsive when 2 nodes of same zone is down in stretch cluster
Sunil Kumar Acharya 2022-08-23 11:28:44 UTC Link ID Github rook/rook/pull/10717
Sunil Kumar Acharya 2022-08-23 11:29:21 UTC Status NEW POST
Eran Tamir 2022-08-23 11:58:50 UTC CC etamir
OpenShift BugZilla Robot 2022-08-25 18:52:11 UTC Link ID Github red-hat-storage/rook/pull/405
OpenShift BugZilla Robot 2022-08-25 18:54:58 UTC Link ID Github red-hat-storage/rook/pull/406
Prasad Desala 2022-08-26 05:48:46 UTC QA Contact nberry mashetty
krishnaram Karthick 2022-08-26 11:54:00 UTC CC kramdoss
RHEL Program Management 2022-08-26 11:54:06 UTC Target Release --- ODF 4.10.6
Mudit Agarwal 2022-08-26 11:56:50 UTC Link ID Github red-hat-storage/rook/pull/405
Sunil Kumar Acharya 2022-08-26 11:58:43 UTC Flags needinfo?(tnielsen)
OpenShift BugZilla Robot 2022-08-26 13:57:38 UTC Status POST MODIFIED
Travis Nielsen 2022-08-26 13:59:10 UTC Flags needinfo?(tnielsen)
errata-xmlrpc 2022-08-30 05:36:45 UTC Status MODIFIED ON_QA
Sunil Kumar Acharya 2022-09-06 08:38:18 UTC Flags needinfo?(tnielsen)
Travis Nielsen 2022-09-06 19:38:38 UTC Doc Type If docs needed, set a value Bug Fix
Doc Text Cause: If the operator is restarted in the middle of a mon failover, multiple mons may be started on the same node, which reduces the mon quorum availability.

Consequence: Two mons could end up on the same node instead of spreading the mons across unique nodes.

Fix: The operator will 1) properly cancel mon failover if the mon failover times out, and 2) ensure that any extra mons are removed based on stretch topology or multiple mons running on the same node.

Result: Mon quorum will maintain proper spread across nodes and stretch topology.
Travis Nielsen 2022-09-12 18:00:00 UTC Flags needinfo?(tnielsen)
Mahesh Shetty 2022-09-20 09:39:44 UTC Flags needinfo?(tnielsen)
Olive Lakra 2022-09-20 14:50:11 UTC CC olakra
Doc Text Cause: If the operator is restarted in the middle of a mon failover, multiple mons may be started on the same node, which reduces the mon quorum availability.

Consequence: Two mons could end up on the same node instead of spreading the mons across unique nodes.

Fix: The operator will 1) properly cancel mon failover if the mon failover times out, and 2) ensure that any extra mons are removed based on stretch topology or multiple mons running on the same node.

Result: Mon quorum will maintain proper spread across nodes and stretch topology.
Previously, the Ceph cluster would become unresponsive when two nodes of same zone is down in a stretch cluster. If the operator restarts in the middle of a mon failover, then many mons may get started on the same node, reducing the mon quorum availability. Thus, two mons could end up on the same node instead of spreading itself across unique nodes.

With this update, the operator can now cancel the mon failover when the mon failover times out, and ensures that extra mons get removed based on stretch topology or many mons running on the same node. Resulting in mon quorum maintaining proper spread across nodes and stretch topology.
Travis Nielsen 2022-09-20 16:52:17 UTC Flags needinfo?(tnielsen)
Travis Nielsen 2022-09-20 16:58:16 UTC Doc Text Previously, the Ceph cluster would become unresponsive when two nodes of same zone is down in a stretch cluster. If the operator restarts in the middle of a mon failover, then many mons may get started on the same node, reducing the mon quorum availability. Thus, two mons could end up on the same node instead of spreading itself across unique nodes.

With this update, the operator can now cancel the mon failover when the mon failover times out, and ensures that extra mons get removed based on stretch topology or many mons running on the same node. Resulting in mon quorum maintaining proper spread across nodes and stretch topology.
Previously, the Ceph cluster would become unresponsive when two nodes of same zone are down in a stretch cluster. If the operator restarts in the middle of a mon failover, then multiple mons may get started on the same node, reducing the mon quorum availability. Thus, two mons could end up on the same node instead of being spread across unique nodes.

With this update, the operator can now cancel the mon failover when the mon failover times out. And in the event that an extra mon is started during an operator restart, the extra mon will be removed based on topology to ensure extra mons are not running on the same node or in the same zone, to maintain optimal topology spread.
Mahesh Shetty 2022-09-21 04:10:00 UTC Status ON_QA VERIFIED
Olive Lakra 2022-09-21 05:06:54 UTC Doc Text Previously, the Ceph cluster would become unresponsive when two nodes of same zone are down in a stretch cluster. If the operator restarts in the middle of a mon failover, then multiple mons may get started on the same node, reducing the mon quorum availability. Thus, two mons could end up on the same node instead of being spread across unique nodes.

With this update, the operator can now cancel the mon failover when the mon failover times out. And in the event that an extra mon is started during an operator restart, the extra mon will be removed based on topology to ensure extra mons are not running on the same node or in the same zone, to maintain optimal topology spread.
Previously, the Ceph cluster would become unresponsive when two nodes of same zone are down in a stretch cluster. If the operator restarts in the middle of a mon failover, then multiple mons may get started on the same node, reducing the mon quorum availability. Thus, two mons could end up on the same node instead of being spread across unique nodes.

With this update, the operator can now cancel the mon failover when the mon failover times out. And in the event that an extra mon is started during an operator restart, the extra mon will be removed based on topology to ensure these extra mons are not running on the same node or in the same zone, to maintain optimal topology spread.
errata-xmlrpc 2022-09-21 10:41:57 UTC Status VERIFIED RELEASE_PENDING
errata-xmlrpc 2022-09-21 17:29:37 UTC Status RELEASE_PENDING CLOSED
Resolution --- ERRATA
Last Closed 2022-09-21 17:29:37 UTC
errata-xmlrpc 2022-09-21 17:29:43 UTC Link ID Red Hat Product Errata RHBA-2022:6675
Elad 2023-08-09 17:03:01 UTC CC odf-bz-bot

Back to bug 2120601