Bug 2142174

Summary: mon/Elector: notify_rank_removed erase rank from both live_pinging and dead_pinging sets for highest ranked MON
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Kamoltat (Junior) Sirivadhna <ksirivad>
Component: RADOSAssignee: Kamoltat (Junior) Sirivadhna <ksirivad>
Status: CLOSED ERRATA QA Contact: Pawan <pdhiran>
Severity: medium Docs Contact: Akash Raj <akraj>
Priority: unspecified    
Version: 5.0CC: akraj, akupczyk, amathuri, bhubbard, bkunal, ceph-eng-bugs, cephqe-warriors, choffman, ksirivad, lflores, muagarwa, nojha, pdhange, pdhiran, rfriedma, rzarzyns, sostapov, sseshasa, sunnagar, vereddy, vumrao
Target Milestone: ---   
Target Release: 5.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-16.2.10-85.el8cp Doc Type: Bug Fix
Doc Text:
.Rank is removed from the `live_pinging` and `dead_pinging` set to mitigate the inconsistent connectivity score issue Previously, when removing two monitors consecutively, if the rank size is equal to Paxos's size, the monitor would face a condition and would not remove rank from the `dead_pinging` set. Due to this, the rank remained in the `dead_pinging` set which would cause problems, such as inconsistent connectivity score when the stretch-cluster mode was enabled. With this fix, a case is added where the highest ranked monitor is removed, that is, when the rank is equal to Paxos's size, remove the rank from the `live_pinging` and `dead_pinging` set. The monitor stays healthy with a clean `live_pinging` and `dead_pinging` set.
Story Points: ---
Clone Of: 2142143 Environment:
Last Closed: 2023-01-11 17:42:24 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2142143    
Bug Blocks: 2121452, 2126049, 2142674, 2142983, 2150223    

Description Kamoltat (Junior) Sirivadhna 2022-11-11 21:54:42 UTC
+++ This bug was initially created as a clone of Bug #2142143 +++

Description of problem:

Added a case where we are removing the highest rank monitor
in notify_rank_removed, the old version did not deal with this
since it would only go into the loop when rank_removed < paxos_size().
Therefore, we added an else case for when rank_removed == paxos_size(),
we erase the rank from both live_pinging and dead_pinging set.

Comment 1 Veera Raghava Reddy 2022-11-14 12:14:29 UTC
Hi Scott,
Looks like this bug is for ODF. Can you review fi this is a blocker for 5.3 or can be differed to 5.3z1?

Comment 2 Vikhyat Umrao 2022-11-15 15:39:56 UTC
(In reply to Veera Raghava Reddy from comment #1)
> Hi Scott,
> Looks like this bug is for ODF. Can you review fi this is a blocker for 5.3
> or can be differed to 5.3z1?

Yesterday, I had a discussion with Junior and Neha as we giving the ODF customer hotfix this can be taken out from 5.3. Because we won't be able to match the 5.3 timelines!

Comment 30 Kamoltat (Junior) Sirivadhna 2023-01-11 10:24:38 UTC
Hi Akash,

here is the doc text,

Thank you!

Comment 31 errata-xmlrpc 2023-01-11 17:42:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 5.3 security update and Bug Fix), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:0076