Bug 2278778
Summary: | [7.1 Upgrade] : Upgrade of MGR in staggered approach also started upgrading NVMeoF service | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Sunil Kumar Nagaraju <sunnagar> | |
Component: | Cephadm | Assignee: | Adam King <adking> | |
Status: | CLOSED ERRATA | QA Contact: | Sunil Kumar Nagaraju <sunnagar> | |
Severity: | high | Docs Contact: | Akash Raj <akraj> | |
Priority: | unspecified | |||
Version: | 7.1 | CC: | adking, akraj, cephqe-warriors, tserlin, vereddy | |
Target Milestone: | --- | |||
Target Release: | 7.1 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | ceph-18.2.1-168.el9cp | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2310839 (view as bug list) | Environment: | ||
Last Closed: | 2024-06-13 14:32:32 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2267614, 2298578, 2298579, 2310839 |
Hello Everyone, Waiting for IBM Ceph 18.2.1-168 to verify the issue, with 18.2.1-167 still mgr also upgrades NVMeoF service daemons as expected. - Thanks Verified the BZ with RH 18.2.1-170. Now Upgrading the MGR in staggered approach, doesn't upgrades NVMeoF. And It needs to follow order of MGR --> MON --> CRASH --> OSD. Once all ceph based daemons are upgraded, then from the last upgrade process, NVMeoF get upgraded. Order: mgr -> mon -> crash -> osd -> mds -> rgw -> rbd-mirror -> cephfs-mirror -> ceph-exporter -> iscsi -> nfs -> nvmeof Here in my case after calling upgrade of ceph-exporter in staggered approach upgraded ceph-exporter at first, then started upgrading NVMeof daemons. Hence Marking this BZ as verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Critical: Red Hat Ceph Storage 7.1 security, enhancements, and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:3925 |
Created attachment 2030955 [details] gw-crash log Description of problem: I was upgrading 4 Node GW in staggered approach. and I noticed 2 issues, but couldn't understand the crash. Issue-1: Initially proceeded with MGR upgrade only with staggered upgrade method, But Noticed that after MGR got updated, nvmeof daemon started upgrading. Is it expected? The same without staggered upgrade approach, The upgrade procedure follows the order Enforced upgrade order is: mgr -> mon -> crash -> osd -> mds -> rgw -> rbd-mirror -> cephfs-mirror -> ceph-exporter -> iscsi -> nfs -> nvmeof Issue-2: Noticing crashes in GW daemons. May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: 1/ 5 mgrc May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: 1/ 5 dpdk May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: 1/ 5 eventtrace May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: 1/ 5 prioritycache May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: 0/ 5 test May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: 0/ 5 cephfs_mirror May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: 0/ 5 cephsqlite May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: 0/ 5 seastore May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: 0/ 5 seastore_onode May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: 0/ 5 seastore_odata May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: 0/ 5 seastore_omap May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: 0/ 5 seastore_tm May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: 0/ 5 seastore_t May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: 0/ 5 seastore_cleaner May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: 0/ 5 seastore_epm May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: 0/ 5 seastore_lba May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: 0/ 5 seastore_fixedkv_tree May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: 0/ 5 seastore_cache May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: 0/ 5 seastore_journal May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: 0/ 5 seastore_device May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: 0/ 5 seastore_backref May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: 0/ 5 alienstore May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: 1/ 5 mclock May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: 0/ 5 cyanstore May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: 1/ 5 ceph_exporter May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: 1/ 5 memstore May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: -2/-2 (syslog threshold) May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: 99/99 (stderr threshold) May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: --- pthread ID / name mapping for recent threads --- May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: 7fc34bb13640 / ms_dispatch May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: 7fc34db17640 / msgr-worker-2 May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: 7fc34e318640 / msgr-worker-1 May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: 7fc34eb19640 / msgr-worker-0 May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: 7fc34fb795c0 / ceph-nvmeof-mon May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: max_recent 500 May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: max_new 1000 May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: log_file /var/lib/ceph/crash/2024-05-02T15:58:31.864166Z_3148cde5-2a00-4b9e-a9a7-f0bb9341803d/log May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: --- end dump of recent events --- May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: 2024-05-02T15:58:31.863+0000 7fc34fb795c0 0 nvmeofgw int NVMeofGwMonitorClient::init() Complete. May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 systemd-coredump[134690]: Process 134655 (ceph-nvmeof-mon) of user 0 dumped core. May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: [02-May-2024 15:58:31] ERROR server.py:42: GatewayServer: SIGCHLD received signum=17 May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: [02-May-2024 15:58:31] ERROR server.py:108: GatewayServer exception occurred: May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: Traceback (most recent call last): May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: File "/remote-source/ceph-nvmeof/app/control/__main__.py", line 43, in <module> May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: gateway.serve() May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: File "/remote-source/ceph-nvmeof/app/control/server.py", line 165, in serve May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: self._start_monitor_client() May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: File "/remote-source/ceph-nvmeof/app/control/server.py", line 223, in _start_monitor_client May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: self._wait_for_group_id() May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: File "/remote-source/ceph-nvmeof/app/control/server.py", line 145, in _wait_for_group_id May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: self.monitor_event.wait() May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: File "/usr/lib64/python3.9/threading.py", line 581, in wait May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: signaled = self._cond.wait(timeout) May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: File "/usr/lib64/python3.9/threading.py", line 312, in wait May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: waiter.acquire() May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: File "/remote-source/ceph-nvmeof/app/control/server.py", line 54, in sigchld_handler May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: raise SystemExit(f"Gateway subprocess terminated {pid=} {exit_code=}") May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: SystemExit: Gateway subprocess terminated pid=18 exit_code=-6 May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: [02-May-2024 15:58:31] INFO server.py:392: Aborting (client.nvmeof.rbd.ceph-sunilkumar-00-bjcvqj-node6.vtwjfa) pid 18... May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6-vtwjfa[134635]: [02-May-2024 15:58:31] INFO server.py:129: Exiting the gateway process. May 02 11:58:31 ceph-sunilkumar-00-bjcvqj-node6 ceph-a77239d0-0874-11ef-9ae9-fa163ea1f4b1-nvmeof-rbd-ceph-sunilkumar-00-bjcvqj-node6- Version-Release number of selected component (if applicable): Upgrading, from IBM Ceph-18.2.1-149 to Ceph-18.2.1-159. NVMe 1.2.4-1 to 1.2.5-2 How reproducible: Steps to Reproduce: 1. Deploy cluster in 18.2.1-149, configure 4 NVMe GWs and ran IO. 2. Start Upgrade with only MGR using staggered upgrade approach. 3. Noticed that NVMeoF services are also started to upgrade after completion of MGR upgrade. 4. Later I stopped the upgrade process and started again which resulted in All GWs crash. Additional info: [ceph: root@ceph-sunilkumar-00-bjcvqj-node1-installer /]# ceph -s cluster: id: a77239d0-0874-11ef-9ae9-fa163ea1f4b1 health: HEALTH_WARN 4 failed cephadm daemon(s) services: mon: 3 daemons, quorum ceph-sunilkumar-00-bjcvqj-node1-installer,ceph-sunilkumar-00-bjcvqj-node2,ceph-sunilkumar-00-bjcvqj-node3 (age 10h) mgr: ceph-sunilkumar-00-bjcvqj-node1-installer.xmrfdw(active, since 10h), standbys: ceph-sunilkumar-00-bjcvqj-node2.yqxjwf osd: 15 osds: 15 up (since 10h), 15 in (since 15h) data: pools: 2 pools, 129 pgs objects: 7.84k objects, 30 GiB usage: 93 GiB used, 282 GiB / 375 GiB avail pgs: 129 active+clean [ceph: root@ceph-sunilkumar-00-bjcvqj-node1-installer /]# ceph versions { "mon": { "ceph version 18.2.1-159.el9cp (5290a81d189b81ab463c73601c44f77a99f4107e) reef (stable)": 3 }, "mgr": { "ceph version 18.2.1-159.el9cp (5290a81d189b81ab463c73601c44f77a99f4107e) reef (stable)": 2 }, "osd": { "ceph version 18.2.1-159.el9cp (5290a81d189b81ab463c73601c44f77a99f4107e) reef (stable)": 15 }, "overall": { "ceph version 18.2.1-159.el9cp (5290a81d189b81ab463c73601c44f77a99f4107e) reef (stable)": 20 } } [ceph: root@ceph-sunilkumar-00-bjcvqj-node1-installer /]# ceph orch ls NAME PORTS RUNNING REFRESHED AGE PLACEMENT alertmanager ?:9093,9094 1/1 4m ago 15h count:1 ceph-exporter 9/9 4m ago 15h * crash 9/9 4m ago 15h * grafana ?:3000 1/1 4m ago 15h count:1 mgr 2/2 4m ago 15h label:mgr mon 3/3 4m ago 15h label:mon node-exporter ?:9100 9/9 4m ago 15h * node-proxy 0/0 - 15h * nvmeof.rbd ?:4420,5500,8009 0/4 4m ago 12h ceph-sunilkumar-00-bjcvqj-node6;ceph-sunilkumar-00-bjcvqj-node7;ceph-sunilkumar-00-bjcvqj-node8;ceph-sunilkumar-00-bjcvqj-node9 osd.all-available-devices 15 4m ago 15h * prometheus ?:9095 1/1 4m ago 15h count:1 [ceph: root@ceph-sunilkumar-00-bjcvqj-node1-installer /]# ceph orch ps | grep nvme nvmeof.rbd.ceph-sunilkumar-00-bjcvqj-node6.vtwjfa ceph-sunilkumar-00-bjcvqj-node6 *:5500,4420,8009 error 5m ago 12h - - <unknown> <unknown> <unknown> nvmeof.rbd.ceph-sunilkumar-00-bjcvqj-node7.vglfty ceph-sunilkumar-00-bjcvqj-node7 *:5500,4420,8009 error 5m ago 12h - - <unknown> <unknown> <unknown> nvmeof.rbd.ceph-sunilkumar-00-bjcvqj-node8.hnmfps ceph-sunilkumar-00-bjcvqj-node8 *:5500,4420,8009 error 5m ago 12h - - <unknown> <unknown> <unknown> nvmeof.rbd.ceph-sunilkumar-00-bjcvqj-node9.guwzdw ceph-sunilkumar-00-bjcvqj-node9 *:5500,4420,8009 error 5m ago 12h - - <unknown> <unknown> <unknown> [ceph: root@ceph-sunilkumar-00-bjcvqj-node1-installer /]# ceph health detail HEALTH_WARN 4 failed cephadm daemon(s) [WRN] CEPHADM_FAILED_DAEMON: 4 failed cephadm daemon(s) daemon nvmeof.rbd.ceph-sunilkumar-00-bjcvqj-node6.vtwjfa on ceph-sunilkumar-00-bjcvqj-node6 is in error state daemon nvmeof.rbd.ceph-sunilkumar-00-bjcvqj-node7.vglfty on ceph-sunilkumar-00-bjcvqj-node7 is in error state daemon nvmeof.rbd.ceph-sunilkumar-00-bjcvqj-node8.hnmfps on ceph-sunilkumar-00-bjcvqj-node8 is in error state daemon nvmeof.rbd.ceph-sunilkumar-00-bjcvqj-node9.guwzdw on ceph-sunilkumar-00-bjcvqj-node9 is in error state [ceph: root@ceph-sunilkumar-00-bjcvqj-node1-installer /]# ceph crash info