.`MonClient` no longer fails to authenticate with `EAGAIN`
Previously, if `MonClient` failed to authenticate with `EAGAIN`, there was a possibility that it would reach a prohibited state in which it would not have an active connection to `ceph-mon` and would not try further to acquire one. Due to this, even though the Ceph Manager daemon was technically alive, it became invisible to monitors in the cluster.
With this fix, the authentication with `EAGAIN` is handled properly and it works as expected.
Description of problem:
=======================
On running the "ceph -s" or "ceph mgr stat" command, the output lists only the active mgr daemon.
The info about standby mgr daemon is missing.
[It gives the impression that only 1 MGR daemon is running on the cluster]
Version-Release number of selected component (if applicable):
=============================================================
17.2.6-21.el9cp
How reproducible:
=================
Always
Steps to Reproduce:
===================
1. Deploy a RHCS 6.1 cluster with dashboard enabled and minimum 2 MGR daemons
2. From CLI, run the below commands:
# ceph -s
# ceph mgr stat
Actual results:
===============
Even though there are 2 MGR daemons up and running, the ceph -s command output does not list the standby daemon info -
[ceph: root@ceph-saraut-6-1-ickncj-node1-installer /]# ceph orch ps --daemon-type=mgr
NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID
mgr.ceph-saraut-6-1-ickncj-node1-installer.zonjdd ceph-saraut-6-1-ickncj-node1-installer *:9283 running (55m) 58s ago 55m 491M - 17.2.6-21.el9cp 777a3e7e474d 1fff81ae780f
mgr.ceph-saraut-6-1-ickncj-node3.duegqa ceph-saraut-6-1-ickncj-node3 *:8443,9283 running (53m) 60s ago 53m 479M - 17.2.6-21.el9cp 777a3e7e474d 850fa458d9a9
[ceph: root@ceph-saraut-6-1-ickncj-node1-installer /]# ceph -s
cluster:
id: 65ab9ea6-dcf6-11ed-9a44-fa163eba73f2
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph-saraut-6-1-ickncj-node1-installer,ceph-saraut-6-1-ickncj-node3,ceph-saraut-6-1-ickncj-node2 (age 52m)
mgr: ceph-saraut-6-1-ickncj-node3.duegqa(active, since 21m)
mds: 1/1 daemons up, 1 standby
osd: 18 osds: 18 up (since 50m), 18 in (since 51m)
rgw: 2 daemons active (2 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 10 pools, 273 pgs
objects: 44.35k objects, 1.3 GiB
usage: 6.5 GiB used, 263 GiB / 270 GiB avail
pgs: 273 active+clean
io:
client: 71 KiB/s rd, 0 B/s wr, 71 op/s rd, 47 op/s wr
[ceph: root@ceph-saraut-6-1-ickncj-node1-installer /]# ceph mgr stat
{
"epoch": 27,
"available": true,
"active_name": "ceph-saraut-6-1-ickncj-node3.duegqa",
"num_standby": 0
}
Expected results:
=================
Info about active as well as standby daemon should be displayed in the output of "ceph -s" and "ceph mgr stat" commands
Additional info:
===============
Observed the same behavior on ceph version 17.2.6-17.el9cp (871f491e0d45eb58a738a645e40bf10b95df45b9) quincy (stable) as well.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Red Hat Ceph Storage 6.1 Bug Fix update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2023:4473
Description of problem: ======================= On running the "ceph -s" or "ceph mgr stat" command, the output lists only the active mgr daemon. The info about standby mgr daemon is missing. [It gives the impression that only 1 MGR daemon is running on the cluster] Version-Release number of selected component (if applicable): ============================================================= 17.2.6-21.el9cp How reproducible: ================= Always Steps to Reproduce: =================== 1. Deploy a RHCS 6.1 cluster with dashboard enabled and minimum 2 MGR daemons 2. From CLI, run the below commands: # ceph -s # ceph mgr stat Actual results: =============== Even though there are 2 MGR daemons up and running, the ceph -s command output does not list the standby daemon info - [ceph: root@ceph-saraut-6-1-ickncj-node1-installer /]# ceph orch ps --daemon-type=mgr NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID mgr.ceph-saraut-6-1-ickncj-node1-installer.zonjdd ceph-saraut-6-1-ickncj-node1-installer *:9283 running (55m) 58s ago 55m 491M - 17.2.6-21.el9cp 777a3e7e474d 1fff81ae780f mgr.ceph-saraut-6-1-ickncj-node3.duegqa ceph-saraut-6-1-ickncj-node3 *:8443,9283 running (53m) 60s ago 53m 479M - 17.2.6-21.el9cp 777a3e7e474d 850fa458d9a9 [ceph: root@ceph-saraut-6-1-ickncj-node1-installer /]# ceph -s cluster: id: 65ab9ea6-dcf6-11ed-9a44-fa163eba73f2 health: HEALTH_OK services: mon: 3 daemons, quorum ceph-saraut-6-1-ickncj-node1-installer,ceph-saraut-6-1-ickncj-node3,ceph-saraut-6-1-ickncj-node2 (age 52m) mgr: ceph-saraut-6-1-ickncj-node3.duegqa(active, since 21m) mds: 1/1 daemons up, 1 standby osd: 18 osds: 18 up (since 50m), 18 in (since 51m) rgw: 2 daemons active (2 hosts, 1 zones) data: volumes: 1/1 healthy pools: 10 pools, 273 pgs objects: 44.35k objects, 1.3 GiB usage: 6.5 GiB used, 263 GiB / 270 GiB avail pgs: 273 active+clean io: client: 71 KiB/s rd, 0 B/s wr, 71 op/s rd, 47 op/s wr [ceph: root@ceph-saraut-6-1-ickncj-node1-installer /]# ceph mgr stat { "epoch": 27, "available": true, "active_name": "ceph-saraut-6-1-ickncj-node3.duegqa", "num_standby": 0 } Expected results: ================= Info about active as well as standby daemon should be displayed in the output of "ceph -s" and "ceph mgr stat" commands Additional info: =============== Observed the same behavior on ceph version 17.2.6-17.el9cp (871f491e0d45eb58a738a645e40bf10b95df45b9) quincy (stable) as well.