Bug 2106031 - [cee/sd][ceph-mgr]ceph-mgr daemon got lost from ceph status
Summary: [cee/sd][ceph-mgr]ceph-mgr daemon got lost from ceph status
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 4.3
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 7.0
Assignee: Nitzan mordechai
QA Contact: skanta
Rivka Pollack
URL:
Whiteboard:
Depends On:
Blocks: 2171847 2237662
TreeView+ depends on / blocked
 
Reported: 2022-07-11 15:01 UTC by Prasanth M V
Modified: 2024-06-12 04:25 UTC (History)
25 users (show)

Fixed In Version: ceph-18.2.0-1
Doc Type: Bug Fix
Doc Text:
.MGR no longer disconnects from the cluster without retries Previously, during network issues, clusters would disconnect with MGR without retries and the authentication of `monclient` would fail. With this fix, retries are added in scenarios where hunting and connection would both fail.
Clone Of:
Environment:
Last Closed: 2023-12-13 15:18:57 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 58379 0 None None None 2023-01-12 00:57:32 UTC
Red Hat Issue Tracker RHCEPH-4739 0 None None None 2022-07-11 15:08:01 UTC
Red Hat Knowledge Base (Solution) 7055829 0 None None None 2024-02-12 18:25:48 UTC
Red Hat Product Errata RHBA-2023:7780 0 None None None 2023-12-13 15:19:10 UTC

Description Prasanth M V 2022-07-11 15:01:46 UTC
Description of problem:

- Customer is running with 3 ceph-mgr daemons. 
- "ceph -s" was reporting no active mgr even though the services were up and running in 3 nodes.
- Later customer restarted the ceph-mgr services in two nodes and that could fix the issue for two nodes."ceph -s" reported one active and one standby mgr in MGR group.
- For the third mgr node the restart is yet to be done. 
- Probably the restart of the ceph-mgr will fix the issue for the third node also.


Version-Release number of selected component (if applicable):
- Red Hat Ceph Storage 4.3 - 4.3  ceph version 14.2.22-110.el8cp

Comment 67 Nitzan mordechai 2023-10-19 08:23:50 UTC
@akraj 
Hi Akash, yes, it will be added to 7.0, we can mention it in the RN, not too important, but it will affect operators

"Clusters can lose connection with MGR when there are some network issue and monclient failed to authenticate, in that situation, MGR could disconnect from the cluster without retries.
The fix will add retry when hunting and connection are both failed"

Comment 69 errata-xmlrpc 2023-12-13 15:18:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 7.0 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:7780

Comment 70 Red Hat Bugzilla 2024-06-12 04:25:04 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.