Bug 2142143 - mon/Elector: notify_rank_removed erase rank from both live_pinging and dead_pinging sets for highest ranked MON
Summary: mon/Elector: notify_rank_removed erase rank from both live_pinging and dead_p...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 5.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 6.0
Assignee: Kamoltat (Junior) Sirivadhna
QA Contact: Pawan
Eliska
URL:
Whiteboard:
Depends On:
Blocks: 2126050 2142174
TreeView+ depends on / blocked
 
Reported: 2022-11-11 19:24 UTC by Kamoltat (Junior) Sirivadhna
Modified: 2023-03-20 19:00 UTC (History)
19 users (show)

Fixed In Version: ceph-17.2.5-2.el9cp
Doc Type: Bug Fix
Doc Text:
.The targeted `rank_removed` no longer gets stuck in `live_pinging` and `dead_pinging` states Previously, in some cases, the `paxos_size` of the Monitor Map would get updated before the rank of the monitor was changed. For example, `paxos_size` would get reduced from 5 to 4, but the highest rank of the Monitors was still 4, thus the old code would skip deleting the rank from `dead_pinging` state. This would cause the targeted rank to remain in `dead_pinging` forever, which would then cause strange `peer_tracker` scores in election strategy: 3. With this fix, a case is added when `rank_removed == paxos_size()` that erases the targeted `rank_removed` from both the `live_pinging` and `dead_pinging` states and the rank does not get stuck forever in either of these sets.
Clone Of:
: 2142174 (view as bug list)
Environment:
Last Closed: 2023-03-20 18:59:13 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph pull 47086 0 None Merged quincy: mon/Elector: notify_rank_removed erase rank from both live_pinging and dead_pinging sets for highest ranked MON 2022-11-11 19:24:47 UTC
Red Hat Issue Tracker RHCEPH-5598 0 None None None 2022-11-11 19:29:35 UTC
Red Hat Product Errata RHBA-2023:1360 0 None None None 2023-03-20 19:00:16 UTC

Description Kamoltat (Junior) Sirivadhna 2022-11-11 19:24:48 UTC
Description of problem:

Added a case where we are removing the highest rank monitor
in notify_rank_removed, the old version did not deal with this
since it would only go into the loop when rank_removed < paxos_size().
Therefore, we added an else case for when rank_removed == paxos_size(),
we erase the rank from both live_pinging and dead_pinging set.

Comment 1 Kamoltat (Junior) Sirivadhna 2022-11-15 17:23:52 UTC
The patch is already in 6.0 as part of reabase ... Moving to POST

Comment 8 Kamoltat (Junior) Sirivadhna 2022-11-29 03:50:34 UTC
modified docs LGTM!

Comment 10 Kamoltat (Junior) Sirivadhna 2022-12-14 19:43:58 UTC
Hi Pawan,

This fix doesn't address the issue you are showing, especially if there is a known issue https://bugzilla.redhat.com/show_bug.cgi?id=2151501. Therefore, if there is no other issue present on this, then I think it should be fine. The motivation behind this PR is correctly removing stuff in  live_pinging and dead_pinging sets. The messed up score is probably part of https://bugzilla.redhat.com/show_bug.cgi?id=2151501

Comment 11 Ken Dreyer (Red Hat) 2022-12-14 21:46:59 UTC
Kamoltat, what is the next action you expect from Pawan?

Comment 33 errata-xmlrpc 2023-03-20 18:59:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 6.0 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:1360


Note You need to log in before you can comment on or make changes to this bug.