Created attachment 2088847 [details] Ceph Health NVMeoF WARNING invalid count Description of problem: Currently under NVMeoF grafana overview dashboard, `Ceph Health NVMeoF WARNING` panel doesn't provide the exact failed count. Instead its just refelects the warning count on NVMEOF_GATEWAY_DOWN event in ceph. as per below health detail status, we can notice that 5 gateways down. However the `Ceph Health NVMeoF WARNING` panel always indicate value "1" which actually represents the NVMEOF_GATEWAY_DOWN warning itself rather than displaying the number Gateways went DOWN in the system. [ceph: root@ceph-sunilkumar-81-00-d6k85g-node1-installer /]# ceph health detail HEALTH_WARN 1 stray daemon(s) not managed by cephadm; 5 gateway(s) are in unavailable state; gateway might be down, try to redeploy. [WRN] CEPHADM_STRAY_DAEMON: 1 stray daemon(s) not managed by cephadm stray daemon nvmeof.ceph-sunilkumar-81-00-d6k85g-node5.duiuns on host ceph-sunilkumar-81-00-d6k85g-node5 not managed by cephadm [WRN] NVMEOF_GATEWAY_DOWN: 5 gateway(s) are in unavailable state; gateway might be down, try to redeploy. NVMeoF Gateway 'client.nvmeof.rbd.ceph-sunilkumar-81-00-d6k85g-node6.xndlff' is unavailable. NVMeoF Gateway 'client.nvmeof.rbd.ceph-sunilkumar-81-00-d6k85g-node7.rlmenl' is unavailable. NVMeoF Gateway 'client.nvmeof.rbd.ceph-sunilkumar-81-00-d6k85g-node8.qwbghd' is unavailable. NVMeoF Gateway 'client.nvmeof.rbd.ceph-sunilkumar-81-00-d6k85g-node9.aagjid' is unavailable. NVMeoF Gateway 'client.nvmeof.rbd.group2.ceph-sunilkumar-81-00-d6k85g-node4.iedqbf' is unavailable. Version-Release number of selected component (if applicable): IBM Ceph 8.1 19.2.1-167.el9cp How reproducible: always Steps to Reproduce: 1. Deploy IBM Ceph cluster. 2. Configure NVMe with multiple GWs and its enties from subsystem to Namespaces. 3. Goto Block --> NVMeoF --> Gateways --> Overview tab, Check the `Ceph Health NVMeoF WARNING` panel, it always reflects one when NVMEOF_GATEWAY_DOWN is fired. Actual results: invalid value at the dashboard panel Additional info: Attaching screenshot for reference [ceph: root@ceph-sunilkumar-81-00-d6k85g-node1-installer /]# ceph orch host ls HOST ADDR LABELS STATUS ceph-sunilkumar-81-00-d6k85g-node1-installer 10.0.67.131 _admin,mon,mgr,installer ceph-sunilkumar-81-00-d6k85g-node2 10.0.64.157 mon,mgr ceph-sunilkumar-81-00-d6k85g-node3 10.0.67.187 mon,osd ceph-sunilkumar-81-00-d6k85g-node4 10.0.66.183 mds,osd ceph-sunilkumar-81-00-d6k85g-node5 10.0.67.29 mds,osd,rgw ceph-sunilkumar-81-00-d6k85g-node6 10.0.64.71 nvmeof-gw ceph-sunilkumar-81-00-d6k85g-node7 10.0.67.24 nvmeof-gw ceph-sunilkumar-81-00-d6k85g-node8 10.0.66.65 nvmeof-gw ceph-sunilkumar-81-00-d6k85g-node9 10.0.66.228 nvmeof-gw 9 hosts in cluster [ceph: root@ceph-sunilkumar-81-00-d6k85g-node1-installer /]# ceph orch ps --daemon-type nvmeof NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID nvmeof.rbd.ceph-sunilkumar-81-00-d6k85g-node6.xndlff ceph-sunilkumar-81-00-d6k85g-node6 *:5500,4420,8009,10008 running (2d) 8m ago 2d 178M - 1.4.7 96d48c1edeaf 58d7a9d0ace9 nvmeof.rbd.ceph-sunilkumar-81-00-d6k85g-node7.rlmenl ceph-sunilkumar-81-00-d6k85g-node7 *:5500,4420,8009,10008 running (2d) 8m ago 2d 176M - 1.4.7 96d48c1edeaf f6f7ac8da07b nvmeof.rbd.ceph-sunilkumar-81-00-d6k85g-node8.qwbghd ceph-sunilkumar-81-00-d6k85g-node8 *:5500,4420,8009,10008 running (2d) 8m ago 2d 180M - 1.4.7 96d48c1edeaf 968828243dc9 nvmeof.rbd.ceph-sunilkumar-81-00-d6k85g-node9.aagjid ceph-sunilkumar-81-00-d6k85g-node9 *:5500,4420,8009,10008 running (2d) 8m ago 2d 187M - 1.4.7 96d48c1edeaf 84ee646d7526 nvmeof.rbd.group2.ceph-sunilkumar-81-00-d6k85g-node4.iedqbf ceph-sunilkumar-81-00-d6k85g-node4 *:5500,4420,8009,10008 running (2d) 6m ago 2d 192M - 1.4.7 96d48c1edeaf e8d379319bb7 nvmeof.rbd.group2.ceph-sunilkumar-81-00-d6k85g-node5.duiuns ceph-sunilkumar-81-00-d6k85g-node5 *:5500,4420,8009,10008 running (2d) 8m ago 2d 165M - 1.4.7 96d48c1edeaf 1d1c192f34a6 [ceph: root@ceph-sunilkumar-81-00-d6k85g-node1-installer /]# ceph orch ls NAME PORTS RUNNING REFRESHED AGE PLACEMENT alertmanager ?:9093,9094 1/1 8m ago 2d count:1 ceph-exporter 9/9 8m ago 2d * crash 9/9 8m ago 2d * grafana ?:3000 1/1 8m ago 2d count:1 mgr 2/2 8m ago 2d label:mgr mon 3/3 8m ago 2d label:mon node-exporter ?:9100 9/9 8m ago 2d * nvmeof.rbd ?:4420,5500,8009,10008 4/4 7m ago 46h ceph-sunilkumar-81-00-d6k85g-node6;ceph-sunilkumar-81-00-d6k85g-node7;ceph-sunilkumar-81-00-d6k85g-node8;ceph-sunilkumar-81-00-d6k85g-node9 nvmeof.rbd.group2 ?:4420,5500,8009,10008 2/2 7m ago 46h ceph-sunilkumar-81-00-d6k85g-node4;ceph-sunilkumar-81-00-d6k85g-node5 osd.all-available-devices 12 7m ago 2d * prometheus ?:9095 1/1 8m ago 2d count:1 [ceph: root@ceph-sunilkumar-81-00-d6k85g-node1-installer /]# ceph nvme-gw show rbd group1 { "epoch": 134, "pool": "rbd", "group": "group1", "features": "LB", "rebalance_ana_group": 4, "num gws": 4, "GW-epoch": 105, "Anagrp list": "[ 1 2 3 4 ]", "num-namespaces": 18, "Created Gateways:": [ { "gw-id": "client.nvmeof.rbd.ceph-sunilkumar-81-00-d6k85g-node6.xndlff", "anagrp-id": 1, "num-namespaces": 5, "performed-full-startup": 1, "Availability": "AVAILABLE", "num-listeners": 2, "ana states": " 1: ACTIVE , 2: STANDBY , 3: STANDBY , 4: STANDBY " }, { "gw-id": "client.nvmeof.rbd.ceph-sunilkumar-81-00-d6k85g-node7.rlmenl", "anagrp-id": 2, "num-namespaces": 4, "performed-full-startup": 1, "Availability": "AVAILABLE", "num-listeners": 2, "ana states": " 1: STANDBY , 2: ACTIVE , 3: STANDBY , 4: STANDBY " }, { "gw-id": "client.nvmeof.rbd.ceph-sunilkumar-81-00-d6k85g-node8.qwbghd", "anagrp-id": 3, "num-namespaces": 5, "performed-full-startup": 1, "Availability": "AVAILABLE", "num-listeners": 2, "ana states": " 1: STANDBY , 2: STANDBY , 3: ACTIVE , 4: STANDBY " }, { "gw-id": "client.nvmeof.rbd.ceph-sunilkumar-81-00-d6k85g-node9.aagjid", "anagrp-id": 4, "num-namespaces": 4, "performed-full-startup": 1, "Availability": "AVAILABLE", "num-listeners": 2, "ana states": " 1: STANDBY , 2: STANDBY , 3: STANDBY , 4: ACTIVE " } ] } [ceph: root@ceph-sunilkumar-81-00-d6k85g-node1-installer /]# ceph nvme-gw show rbd group2 { "epoch": 134, "pool": "rbd", "group": "group2", "features": "LB", "rebalance_ana_group": 2, "num gws": 2, "GW-epoch": 80, "Anagrp list": "[ 1 2 ]", "num-namespaces": 8, "Created Gateways:": [ { "gw-id": "client.nvmeof.rbd.group2.ceph-sunilkumar-81-00-d6k85g-node4.iedqbf", "anagrp-id": 1, "num-namespaces": 4, "performed-full-startup": 1, "Availability": "AVAILABLE", "num-listeners": 1, "ana states": " 1: ACTIVE , 2: STANDBY " }, { "gw-id": "client.nvmeof.rbd.group2.ceph-sunilkumar-81-00-d6k85g-node5.duiuns", "anagrp-id": 2, "num-namespaces": 4, "performed-full-startup": 1, "Availability": "AVAILABLE", "num-listeners": 1, "ana states": " 1: STANDBY , 2: ACTIVE " } ] } [ceph: root@ceph-sunilkumar-81-00-d6k85g-node1-installer /]# ceph nvme-gw show rbd '' { "epoch": 134, "pool": "rbd", "group": "", "features": "LB", "rebalance_ana_group": 4, "num gws": 4, "GW-epoch": 24, "Anagrp list": "[ 1 2 3 4 ]", "num-namespaces": 0, "Created Gateways:": [ { "gw-id": "client.nvmeof.rbd.ceph-sunilkumar-81-00-d6k85g-node6.xndlff", "anagrp-id": 1, "num-namespaces": 0, "performed-full-startup": 0, "Availability": "UNAVAILABLE", "ana states": " 1: STANDBY , 2: STANDBY , 3: STANDBY , 4: STANDBY " }, { "gw-id": "client.nvmeof.rbd.ceph-sunilkumar-81-00-d6k85g-node7.rlmenl", "anagrp-id": 2, "num-namespaces": 0, "performed-full-startup": 0, "Availability": "UNAVAILABLE", "ana states": " 1: STANDBY , 2: STANDBY , 3: STANDBY , 4: STANDBY " }, { "gw-id": "client.nvmeof.rbd.ceph-sunilkumar-81-00-d6k85g-node8.qwbghd", "anagrp-id": 3, "num-namespaces": 0, "performed-full-startup": 0, "Availability": "UNAVAILABLE", "ana states": " 1: STANDBY , 2: STANDBY , 3: STANDBY , 4: STANDBY " }, { "gw-id": "client.nvmeof.rbd.ceph-sunilkumar-81-00-d6k85g-node9.aagjid", "anagrp-id": 4, "num-namespaces": 0, "performed-full-startup": 0, "Availability": "UNAVAILABLE", "ana states": " 1: STANDBY , 2: STANDBY , 3: STANDBY , 4: STANDBY " } ] } [ceph: root@ceph-sunilkumar-81-00-d6k85g-node1-installer /]# [ceph: root@ceph-sunilkumar-81-00-d6k85g-node1-installer /]# [ceph: root@ceph-sunilkumar-81-00-d6k85g-node1-installer /]# [ceph: root@ceph-sunilkumar-81-00-d6k85g-node1-installer /]# [ceph: root@ceph-sunilkumar-81-00-d6k85g-node1-installer /]# ceph status cluster: id: 2e83a2a8-296a-11f0-bf21-fa163e699b11 health: HEALTH_WARN 2 stray daemon(s) not managed by cephadm 4 gateway(s) are in unavailable state; gateway might be down, try to redeploy. services: mon: 3 daemons, quorum ceph-sunilkumar-81-00-d6k85g-node1-installer,ceph-sunilkumar-81-00-d6k85g-node2,ceph-sunilkumar-81-00-d6k85g-node3 (age 2d) mgr: ceph-sunilkumar-81-00-d6k85g-node1-installer.qnxdec(active, since 2d), standbys: ceph-sunilkumar-81-00-d6k85g-node2.qmritp osd: 12 osds: 12 up (since 2d), 12 in (since 2d) nvmeof (rbd.): 4 gateways: 0 active () nvmeof (rbd.group1): 4 gateways: 4 active (rbd.ceph-sunilkumar-81-00-d6k85g-node6.xndlff, rbd.ceph-sunilkumar-81-00-d6k85g-node7.rlmenl, rbd.ceph-sunilkumar-81-00-d6k85g-node8.qwbghd, rbd.ceph-sunilkumar-81-00-d6k85g-node9.aagjid) nvmeof (rbd.group2): 2 gateways: 2 active (ceph-sunilkumar-81-00-d6k85g-node4.iedqbf, ceph-sunilkumar-81-00-d6k85g-node5.duiuns) data: pools: 3 pools, 161 pgs objects: 10.35k objects, 40 GiB usage: 82 GiB used, 158 GiB / 240 GiB avail pgs: 161 active+clean io: client: 26 KiB/s rd, 0 B/s wr, 7 op/s rd, 3 op/s wr