Bug 2106031

Summary: [cee/sd][ceph-mgr]ceph-mgr daemon got lost from ceph status
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Prasanth M V <pmv>
Component: RADOSAssignee: Nitzan mordechai <nmordech>
Status: CLOSED ERRATA QA Contact: skanta
Severity: medium Docs Contact: Rivka Pollack <rpollack>
Priority: unspecified    
Version: 4.3CC: akraj, akupczyk, amathuri, bhubbard, bkunal, ceph-eng-bugs, cephqe-warriors, chaekim, choffman, gjose, hchatter, kdreyer, ksirivad, lflores, lithomas, mcaldeir, nmordech, nojha, pdhange, rfriedma, rzarzyns, skanta, sorkim, sseshasa, vumrao
Target Milestone: ---   
Target Release: 7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-18.2.0-1 Doc Type: Bug Fix
Doc Text:
.MGR no longer disconnects from the cluster without retries Previously, during network issues, clusters would disconnect with MGR without retries and the authentication of `monclient` would fail. With this fix, retries are added in scenarios where hunting and connection would both fail.
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-12-13 15:18:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2171847, 2237662    

Description Prasanth M V 2022-07-11 15:01:46 UTC
Description of problem:

- Customer is running with 3 ceph-mgr daemons. 
- "ceph -s" was reporting no active mgr even though the services were up and running in 3 nodes.
- Later customer restarted the ceph-mgr services in two nodes and that could fix the issue for two nodes."ceph -s" reported one active and one standby mgr in MGR group.
- For the third mgr node the restart is yet to be done. 
- Probably the restart of the ceph-mgr will fix the issue for the third node also.


Version-Release number of selected component (if applicable):
- Red Hat Ceph Storage 4.3 - 4.3  ceph version 14.2.22-110.el8cp

Comment 67 Nitzan mordechai 2023-10-19 08:23:50 UTC
@akraj 
Hi Akash, yes, it will be added to 7.0, we can mention it in the RN, not too important, but it will affect operators

"Clusters can lose connection with MGR when there are some network issue and monclient failed to authenticate, in that situation, MGR could disconnect from the cluster without retries.
The fix will add retry when hunting and connection are both failed"

Comment 69 errata-xmlrpc 2023-12-13 15:18:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 7.0 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:7780

Comment 70 Red Hat Bugzilla 2024-06-12 04:25:04 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days