Bug 2142983 - Ceph unresponsive after provoking failure in datacenter, no IO. Stretch Cluster internal mode.
Summary: Ceph unresponsive after provoking failure in datacenter, no IO. Stretch Clust...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 5.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 6.0
Assignee: Kamoltat (Junior) Sirivadhna
QA Contact: Pawan
Eliska
URL:
Whiteboard:
Depends On: 2142141 2142174 2142674
Blocks: 2126050
TreeView+ depends on / blocked
 
Reported: 2022-11-15 17:33 UTC by Vikhyat Umrao
Modified: 2023-03-20 19:00 UTC (History)
39 users (show)

Fixed In Version: ceph-17.2.5-31.el9cp
Doc Type: Bug Fix
Doc Text:
.Ceph Monitors are not stuck during failover of a site Previously, the `removed_ranks` variable would not discard its content for every update of the Monitor map. Thus it would replace monitors in a 2-site stretch cluster and fail over of one of the site would cause connection scores, including ranks associated with the scores, to be inconsistent. Inconsistent connection scores would cause deadlock during the monitor election period, which would result in Ceph to become unresponsive. Once this happened, there was no way for the monitor rank associated with the connection score to correct itself. With this fix, the `removed_ranks` variable gets cleared with every update of the monitor map. Monitors are no longer stuck in the election period and Ceph no longer becomes unresponsive when replacing monitors and failing over a site. Moreover, there is a way to manually force the connection scores to correct themselves with the `ceph daemon mon._NAME_ connection scores reset` command.
Clone Of: 2142674
Environment:
Last Closed: 2023-03-20 18:59:13 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph pull 48698 0 None open [DNM][Downstream][hotfix]mon/Elector: Change how we handle removed_ranks and notify_rank_removed() 2022-11-16 20:19:34 UTC
Github ceph ceph pull 49311 0 None open quincy: mon/Elector: Change how we handle removed_ranks and notify_rank_removed() 2022-12-14 21:53:21 UTC
Red Hat Issue Tracker RHCEPH-5615 0 None None None 2022-11-15 17:40:57 UTC
Red Hat Product Errata RHBA-2023:1360 0 None None None 2023-03-20 19:00:16 UTC

Comment 41 errata-xmlrpc 2023-03-20 18:59:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 6.0 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:1360


Note You need to log in before you can comment on or make changes to this bug.