Back to bug 2142983
| Who | When | What | Removed | Added |
|---|---|---|---|---|
| Red Hat One Jira (issues.redhat.com) | 2022-11-15 17:40:57 UTC | Link ID | Red Hat Issue Tracker RHCEPH-5615 | |
| Kamoltat (Junior) Sirivadhna | 2022-11-16 20:19:34 UTC | Link ID | Github ceph/ceph/pull/48698 | |
| Neha Ojha | 2022-12-14 21:53:21 UTC | Link ID | Github ceph/ceph/pull/49311 | |
| Kamoltat (Junior) Sirivadhna | 2022-12-15 22:44:19 UTC | Status | ASSIGNED | POST |
| Ken Dreyer (Red Hat) | 2022-12-16 01:59:22 UTC | CC | kdreyer | |
| Fixed In Version | ceph-17.2.5-31.el9cp | |||
| Status | POST | MODIFIED | ||
| Ken Dreyer (Red Hat) | 2022-12-16 02:35:33 UTC | Flags | needinfo?(pdhiran) | |
| Veera Raghava Reddy | 2022-12-16 04:58:20 UTC | CC | vereddy | |
| Flags | needinfo?(pdhiran) | |||
| errata-xmlrpc | 2022-12-16 05:37:49 UTC | Status | MODIFIED | ON_QA |
| Eliska | 2022-12-19 12:50:47 UTC | CC | ekristov | |
| Flags | needinfo?(ksirivad) | |||
| Kamoltat (Junior) Sirivadhna | 2022-12-19 21:36:14 UTC | Flags | needinfo?(ksirivad) | |
| Doc Text | Cause: The variable `removed_ranks` does not discard its content for every update of Monmap, therefore, replacing MONs in a 2-site stretch-cluster and failing over one of the sites cause connection scores (including ranks associated with the scores) to be inconsistent. Consequence: Inconsistent connection scores cause deadlock during the MON election period, causing Ceph to become unresponsive. Moreover, once this happens, there is no way for the MON rank associated with the connection score to correct itself. Fix: The variable `removed_ranks` gets cleared every update of the Monmap. Moreover, we added a way for the connection score to correct itself when executing the command `ceph daemon mon.{name} connection scores reset`. Result: MONs are no longer stuck in the election period and Ceph no longer becomes responsive when replacing monitors and failing over a site. Furthermore, we also have a way to manually force the connection scores to correct themselves. | |||
| Doc Type | If docs needed, set a value | Bug Fix | ||
| Eliska | 2022-12-23 10:34:59 UTC | Flags | needinfo?(ksirivad) | |
| Doc Text | Cause: The variable `removed_ranks` does not discard its content for every update of Monmap, therefore, replacing MONs in a 2-site stretch-cluster and failing over one of the sites cause connection scores (including ranks associated with the scores) to be inconsistent. Consequence: Inconsistent connection scores cause deadlock during the MON election period, causing Ceph to become unresponsive. Moreover, once this happens, there is no way for the MON rank associated with the connection score to correct itself. Fix: The variable `removed_ranks` gets cleared every update of the Monmap. Moreover, we added a way for the connection score to correct itself when executing the command `ceph daemon mon.{name} connection scores reset`. Result: MONs are no longer stuck in the election period and Ceph no longer becomes responsive when replacing monitors and failing over a site. Furthermore, we also have a way to manually force the connection scores to correct themselves. | .Ceph Monitors are not stuck during failover of a site Previously, the `removed_ranks` variable would not discard its content for every update of the Monitor map. Thus it would replace monitors in a 2-site stretch cluster and fail over of one of the site would cause connection scores, including ranks associated with the scores, to be inconsistent. Inconsistent connection scores would cause deadlock during the monitor election period, which would result in Ceph to become unresponsive. Once this happened, there was no way for the monitor rank associated with the connection score to correct itself. With this fix, the `removed_ranks` variable gets cleared with every update of the monitor map. Monitors are no longer stuck in the election period and Ceph no longer becomes unresponsive when replacing monitors and failing over a site. Moreover, there is a way to manually force the connection scores to correct themselves with the `ceph daemon mon._NAME_ connection scores reset` command. | ||
| Docs Contact | ekristov | |||
| Eliska | 2022-12-23 10:44:21 UTC | Blocks | 2126050 | |
| Kamoltat (Junior) Sirivadhna | 2022-12-27 10:17:59 UTC | Flags | needinfo?(ksirivad) | |
| Red Hat Bugzilla | 2022-12-31 19:04:25 UTC | CC | mashetty | |
| Red Hat Bugzilla | 2022-12-31 19:13:39 UTC | CC | amathuri | |
| Red Hat Bugzilla | 2022-12-31 19:32:48 UTC | CC | pdhiran | |
| QA Contact | pdhiran | |||
| Red Hat Bugzilla | 2022-12-31 20:00:13 UTC | CC | sseshasa | |
| Red Hat Bugzilla | 2022-12-31 22:37:04 UTC | CC | ebenahar | |
| Red Hat Bugzilla | 2022-12-31 22:43:41 UTC | CC | rfriedma | |
| Red Hat Bugzilla | 2022-12-31 23:43:49 UTC | CC | rzarzyns | |
| Red Hat Bugzilla | 2022-12-31 23:46:05 UTC | CC | akupczyk | |
| Red Hat Bugzilla | 2023-01-01 05:35:34 UTC | Assignee | ksirivad | nojha |
| CC | ksirivad | |||
| Red Hat Bugzilla | 2023-01-01 05:40:01 UTC | CC | tserlin | |
| Red Hat Bugzilla | 2023-01-01 05:47:22 UTC | CC | flucifre | |
| Red Hat Bugzilla | 2023-01-01 06:02:15 UTC | CC | bniver | |
| Red Hat Bugzilla | 2023-01-01 06:03:42 UTC | CC | kdreyer | |
| Red Hat Bugzilla | 2023-01-01 06:27:22 UTC | CC | lflores | |
| Red Hat Bugzilla | 2023-01-01 06:29:13 UTC | CC | choffman | |
| Red Hat Bugzilla | 2023-01-01 07:23:10 UTC | CC | tnielsen | |
| Red Hat Bugzilla | 2023-01-01 08:22:23 UTC | CC | vkolli | |
| Red Hat Bugzilla | 2023-01-01 08:30:02 UTC | CC | bkunal | |
| Red Hat Bugzilla | 2023-01-01 08:39:06 UTC | CC | nojha | |
| Assignee | nojha | nobody | ||
| Red Hat Bugzilla | 2023-01-01 08:40:01 UTC | CC | pdhange | |
| Red Hat Bugzilla | 2023-01-01 08:47:56 UTC | CC | vereddy | |
| Red Hat Bugzilla | 2023-01-01 08:50:24 UTC | CC | vumrao | |
| Pawan | 2023-01-02 16:30:07 UTC | QA Contact | pdhiran | |
| CC | pdhiran | |||
| Alasdair Kergon | 2023-01-04 04:40:45 UTC | CC | akupczyk | |
| Alasdair Kergon | 2023-01-04 04:43:11 UTC | Assignee | nobody | ksirivad |
| Alasdair Kergon | 2023-01-04 04:43:34 UTC | CC | amathuri | |
| Alasdair Kergon | 2023-01-04 05:03:42 UTC | CC | kdreyer | |
| Alasdair Kergon | 2023-01-04 05:08:58 UTC | CC | ksirivad | |
| Alasdair Kergon | 2023-01-04 05:10:58 UTC | CC | lflores | |
| Alasdair Kergon | 2023-01-04 05:21:38 UTC | CC | nojha | |
| Alasdair Kergon | 2023-01-04 05:28:18 UTC | CC | pdhange | |
| Alasdair Kergon | 2023-01-04 05:34:52 UTC | CC | rfriedma | |
| Alasdair Kergon | 2023-01-04 05:37:37 UTC | CC | rzarzyns | |
| Alasdair Kergon | 2023-01-04 05:49:38 UTC | CC | tnielsen | |
| Alasdair Kergon | 2023-01-04 05:57:35 UTC | CC | vkolli | |
| Alasdair Kergon | 2023-01-04 05:59:30 UTC | CC | vumrao | |
| Alasdair Kergon | 2023-01-04 06:09:44 UTC | CC | bkunal | |
| Alasdair Kergon | 2023-01-04 06:11:25 UTC | CC | bniver | |
| Alasdair Kergon | 2023-01-04 06:13:47 UTC | CC | choffman | |
| Alasdair Kergon | 2023-01-04 06:29:04 UTC | CC | vereddy | |
| Alasdair Kergon | 2023-01-04 06:41:59 UTC | CC | ebenahar | |
| Alasdair Kergon | 2023-01-04 06:43:51 UTC | CC | flucifre | |
| Alasdair Kergon | 2023-01-04 06:50:47 UTC | CC | mashetty | |
| Alasdair Kergon | 2023-01-04 06:56:31 UTC | CC | sseshasa | |
| Sunil Kumar Nagaraju | 2023-01-06 11:49:42 UTC | CC | sunnagar | |
| Pawan | 2023-01-09 08:00:03 UTC | Status | ON_QA | VERIFIED |
| Red Hat Bugzilla | 2023-01-09 08:30:35 UTC | CC | ceph-eng-bugs | |
| Alasdair Kergon | 2023-01-09 19:43:36 UTC | CC | ceph-eng-bugs | |
| Red Hat Bugzilla | 2023-01-31 23:38:09 UTC | CC | madam | |
| errata-xmlrpc | 2023-03-20 18:59:13 UTC | CC | tserlin | |
| Group | private | |||
| Resolution | --- | ERRATA | ||
| Status | VERIFIED | CLOSED | ||
| Last Closed | 2023-03-20 18:59:13 UTC | |||
| errata-xmlrpc | 2023-03-20 19:00:16 UTC | Link ID | Red Hat Product Errata RHBA-2023:1360 |
Back to bug 2142983