Description of problem: Observing that cluster enters health_err state post removal of hosts. # ceph -s cluster: id: a6832daa-ccab-11ed-afae-fa163ec3bb1a health: HEALTH_ERR failed to probe daemons or devices Module 'cephadm' has failed: 'ceph-pdhiran-o85tl0-node9' services: mon: 4 daemons, quorum ceph-pdhiran-o85tl0-node2,ceph-pdhiran-o85tl0-node6,ceph-pdhiran-o85tl0-node7,ceph-pdhiran-o85tl0-node11 (age 41m) mgr: ceph-pdhiran-o85tl0-node6.hoglst(active, since 5h), standbys: ceph-pdhiran-o85tl0-node2.zwnqgb mds: 1/1 daemons up, 1 standby osd: 24 osds: 24 up (since 64m), 24 in (since 112m) rgw: 2 daemons active (2 hosts, 1 zones) data: volumes: 1/1 healthy pools: 13 pools, 449 pgs objects: 250 objects, 32 KiB usage: 1.0 GiB used, 599 GiB / 600 GiB avail pgs: 449 active+clean [root@ceph-pdhiran-o85tl0-node1-installer ~]# [root@ceph-pdhiran-o85tl0-node1-installer ~]# ceph orch host ls HOST ADDR LABELS STATUS ceph-pdhiran-o85tl0-node1-installer 10.0.210.201 _admin ceph-pdhiran-o85tl0-node10 10.0.208.231 osd ceph-pdhiran-o85tl0-node11 10.0.211.10 mon rgw ceph-pdhiran-o85tl0-node12 10.0.208.219 osd-bak _no_schedule ceph-pdhiran-o85tl0-node13 10.0.210.103 osd-bak _no_schedule ceph-pdhiran-o85tl0-node2 10.0.208.239 mon mds alertmanager mgr ceph-pdhiran-o85tl0-node3 10.0.211.84 osd ceph-pdhiran-o85tl0-node4 10.0.209.113 osd ceph-pdhiran-o85tl0-node5 10.0.210.89 osd ceph-pdhiran-o85tl0-node6 10.0.209.209 mon mds mgr ceph-pdhiran-o85tl0-node7 10.0.210.41 mon rgw 11 hosts in cluster # ceph health detail HEALTH_ERR failed to probe daemons or devices; Module 'cephadm' has failed: 'ceph-pdhiran-o85tl0-node9' [WRN] CEPHADM_REFRESH_FAILED: failed to probe daemons or devices host ceph-pdhiran-o85tl0-node9 `cephadm ceph-volume` failed: host address is empty host ceph-pdhiran-o85tl0-node9 `cephadm list-networks` failed: host address is empty host ceph-pdhiran-o85tl0-node9 `cephadm gather-facts` failed: host address is empty [ERR] MGR_MODULE_ERROR: Module 'cephadm' has failed: 'ceph-pdhiran-o85tl0-node9' Module 'cephadm' has failed: 'ceph-pdhiran-o85tl0-node9' Version-Release number of selected component (if applicable): ceph version 16.2.10-138.el8cp (a63ae467c8e1f7503ea3855893f1e5ca189a71b9) pacific (stable) How reproducible: 1/1 Steps to Reproduce: 1. Deploy RHCS 5.3 cluster, Create pools, write data. 2. Prepare to remove one OSD host from the cluster. 3. Perform drain operation on the host. # ceph orch host drain ceph-pdhiran-o85tl0-node9 Scheduled to remove the following daemons from host 'ceph-pdhiran-o85tl0-node9' type id -------------------- --------------- crash ceph-pdhiran-o85tl0-node9 node-exporter ceph-pdhiran-o85tl0-node9 osd 14 osd 4 osd 9 osd 19 4. Once the drain is complete, remove the host from the cluster. # ceph orch osd rm status -f json [{"drain_done_at": "2023-03-28T12:02:33.990155Z", "drain_started_at": "2023-03-28T12:02:23.183372Z", "drain_stopped_at": null, "draining": false, "force": false, "hostname": "ceph-pdhiran-o85tl0-node9", "osd_id": 9, "process_started_at": "2023-03-28T12:02:00.879414Z", "replace": false, "started": true, "stopped": false, "zap": false}] [root@ceph-pdhiran-o85tl0-node1-installer ~]# ceph orch osd rm status -f json No OSD remove/replace operations reported # ceph orch host rm ceph-pdhiran-o85tl0-node9 Removed host 'ceph-pdhiran-o85tl0-node9' 5. After some time, Observed that the cluster in in Health_err state. Actual results: Cluster in Health_Err state post OSD host removal Expected results: Cluster to be Health_ok post removal operation Additional info: Failover of active mgr clears the health error and cluster reaches health_ok ceph mgr fail.