Back to bug 2120601
| Who | When | What | Removed | Added |
|---|---|---|---|---|
| Sunil Kumar Acharya | 2022-08-23 11:26:57 UTC | Summary | [GSS] ceph cluster unresponsive when 2 nodes of same zone is down in stretch cluster | [GSS] [4.10.z-Clone] ceph cluster unresponsive when 2 nodes of same zone is down in stretch cluster |
| Sunil Kumar Acharya | 2022-08-23 11:28:44 UTC | Link ID | Github rook/rook/pull/10717 | |
| Sunil Kumar Acharya | 2022-08-23 11:29:21 UTC | Status | NEW | POST |
| Eran Tamir | 2022-08-23 11:58:50 UTC | CC | etamir | |
| OpenShift BugZilla Robot | 2022-08-25 18:52:11 UTC | Link ID | Github red-hat-storage/rook/pull/405 | |
| OpenShift BugZilla Robot | 2022-08-25 18:54:58 UTC | Link ID | Github red-hat-storage/rook/pull/406 | |
| Prasad Desala | 2022-08-26 05:48:46 UTC | QA Contact | nberry | mashetty |
| krishnaram Karthick | 2022-08-26 11:54:00 UTC | CC | kramdoss | |
| RHEL Program Management | 2022-08-26 11:54:06 UTC | Target Release | --- | ODF 4.10.6 |
| Mudit Agarwal | 2022-08-26 11:56:50 UTC | Link ID | Github red-hat-storage/rook/pull/405 | |
| Sunil Kumar Acharya | 2022-08-26 11:58:43 UTC | Flags | needinfo?(tnielsen) | |
| OpenShift BugZilla Robot | 2022-08-26 13:57:38 UTC | Status | POST | MODIFIED |
| Travis Nielsen | 2022-08-26 13:59:10 UTC | Flags | needinfo?(tnielsen) | |
| errata-xmlrpc | 2022-08-30 05:36:45 UTC | Status | MODIFIED | ON_QA |
| Sunil Kumar Acharya | 2022-09-06 08:38:18 UTC | Flags | needinfo?(tnielsen) | |
| Travis Nielsen | 2022-09-06 19:38:38 UTC | Doc Type | If docs needed, set a value | Bug Fix |
| Doc Text | Cause: If the operator is restarted in the middle of a mon failover, multiple mons may be started on the same node, which reduces the mon quorum availability. Consequence: Two mons could end up on the same node instead of spreading the mons across unique nodes. Fix: The operator will 1) properly cancel mon failover if the mon failover times out, and 2) ensure that any extra mons are removed based on stretch topology or multiple mons running on the same node. Result: Mon quorum will maintain proper spread across nodes and stretch topology. |
|||
| Travis Nielsen | 2022-09-12 18:00:00 UTC | Flags | needinfo?(tnielsen) | |
| Mahesh Shetty | 2022-09-20 09:39:44 UTC | Flags | needinfo?(tnielsen) | |
| Olive Lakra | 2022-09-20 14:50:11 UTC | CC | olakra | |
| Doc Text | Cause: If the operator is restarted in the middle of a mon failover, multiple mons may be started on the same node, which reduces the mon quorum availability. Consequence: Two mons could end up on the same node instead of spreading the mons across unique nodes. Fix: The operator will 1) properly cancel mon failover if the mon failover times out, and 2) ensure that any extra mons are removed based on stretch topology or multiple mons running on the same node. Result: Mon quorum will maintain proper spread across nodes and stretch topology. | Previously, the Ceph cluster would become unresponsive when two nodes of same zone is down in a stretch cluster. If the operator restarts in the middle of a mon failover, then many mons may get started on the same node, reducing the mon quorum availability. Thus, two mons could end up on the same node instead of spreading itself across unique nodes. With this update, the operator can now cancel the mon failover when the mon failover times out, and ensures that extra mons get removed based on stretch topology or many mons running on the same node. Resulting in mon quorum maintaining proper spread across nodes and stretch topology. |
||
| Travis Nielsen | 2022-09-20 16:52:17 UTC | Flags | needinfo?(tnielsen) | |
| Travis Nielsen | 2022-09-20 16:58:16 UTC | Doc Text | Previously, the Ceph cluster would become unresponsive when two nodes of same zone is down in a stretch cluster. If the operator restarts in the middle of a mon failover, then many mons may get started on the same node, reducing the mon quorum availability. Thus, two mons could end up on the same node instead of spreading itself across unique nodes. With this update, the operator can now cancel the mon failover when the mon failover times out, and ensures that extra mons get removed based on stretch topology or many mons running on the same node. Resulting in mon quorum maintaining proper spread across nodes and stretch topology. | Previously, the Ceph cluster would become unresponsive when two nodes of same zone are down in a stretch cluster. If the operator restarts in the middle of a mon failover, then multiple mons may get started on the same node, reducing the mon quorum availability. Thus, two mons could end up on the same node instead of being spread across unique nodes. With this update, the operator can now cancel the mon failover when the mon failover times out. And in the event that an extra mon is started during an operator restart, the extra mon will be removed based on topology to ensure extra mons are not running on the same node or in the same zone, to maintain optimal topology spread. |
| Mahesh Shetty | 2022-09-21 04:10:00 UTC | Status | ON_QA | VERIFIED |
| Olive Lakra | 2022-09-21 05:06:54 UTC | Doc Text | Previously, the Ceph cluster would become unresponsive when two nodes of same zone are down in a stretch cluster. If the operator restarts in the middle of a mon failover, then multiple mons may get started on the same node, reducing the mon quorum availability. Thus, two mons could end up on the same node instead of being spread across unique nodes. With this update, the operator can now cancel the mon failover when the mon failover times out. And in the event that an extra mon is started during an operator restart, the extra mon will be removed based on topology to ensure extra mons are not running on the same node or in the same zone, to maintain optimal topology spread. | Previously, the Ceph cluster would become unresponsive when two nodes of same zone are down in a stretch cluster. If the operator restarts in the middle of a mon failover, then multiple mons may get started on the same node, reducing the mon quorum availability. Thus, two mons could end up on the same node instead of being spread across unique nodes. With this update, the operator can now cancel the mon failover when the mon failover times out. And in the event that an extra mon is started during an operator restart, the extra mon will be removed based on topology to ensure these extra mons are not running on the same node or in the same zone, to maintain optimal topology spread. |
| errata-xmlrpc | 2022-09-21 10:41:57 UTC | Status | VERIFIED | RELEASE_PENDING |
| errata-xmlrpc | 2022-09-21 17:29:37 UTC | Status | RELEASE_PENDING | CLOSED |
| Resolution | --- | ERRATA | ||
| Last Closed | 2022-09-21 17:29:37 UTC | |||
| errata-xmlrpc | 2022-09-21 17:29:43 UTC | Link ID | Red Hat Product Errata RHBA-2022:6675 | |
| Elad | 2023-08-09 17:03:01 UTC | CC | odf-bz-bot |
Back to bug 2120601