Bug 2142983
Summary: | Ceph unresponsive after provoking failure in datacenter, no IO. Stretch Cluster internal mode. | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Vikhyat Umrao <vumrao> |
Component: | RADOS | Assignee: | Kamoltat (Junior) Sirivadhna <ksirivad> |
Status: | CLOSED ERRATA | QA Contact: | Pawan <pdhiran> |
Severity: | high | Docs Contact: | Eliska <ekristov> |
Priority: | unspecified | ||
Version: | 5.1 | CC: | akupczyk, amathuri, bhubbard, bkunal, bniver, ceph-eng-bugs, cephqe-warriors, choffman, ddomingu, ebenahar, ebonilla, Egarciad, ekristov, flucifre, gfarnum, jclaretm, kdreyer, ksirivad, lflores, mashetty, maugarci, mduasope, mgokhool, muagarwa, nojha, nravinas, ocs-bugs, pdhange, pdhiran, rfriedma, rzarzyns, sarora, sseshasa, sunnagar, tnielsen, tserlin, vereddy, vkolli, vumrao |
Target Milestone: | --- | ||
Target Release: | 6.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | ceph-17.2.5-31.el9cp | Doc Type: | Bug Fix |
Doc Text: |
.Ceph Monitors are not stuck during failover of a site
Previously, the `removed_ranks` variable would not discard its content for every update of the Monitor map.
Thus it would replace monitors in a 2-site stretch cluster and fail over of one of the site would cause connection scores, including ranks associated with the scores, to be inconsistent.
Inconsistent connection scores would cause deadlock during the monitor election period, which would result in Ceph to become unresponsive.
Once this happened, there was no way for the monitor rank associated with the connection score to correct itself.
With this fix, the `removed_ranks` variable gets cleared with every update of the monitor map.
Monitors are no longer stuck in the election period and Ceph no longer becomes unresponsive when replacing monitors and failing over a site.
Moreover, there is a way to manually force the connection scores to correct themselves with the `ceph daemon mon._NAME_ connection scores reset` command.
|
Story Points: | --- |
Clone Of: | 2142674 | Environment: | |
Last Closed: | 2023-03-20 18:59:13 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 2142141, 2142174, 2142674 | ||
Bug Blocks: | 2126050 |
Comment 41
errata-xmlrpc
2023-03-20 18:59:13 UTC
|