Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 2322347

Summary: Change ceph health to HEALTH_WARN if active nvmeof Gateway daemon is not in running state
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Rahul Lepakshi <rlepaksh>
Component: CephadmAssignee: Adam King <adking>
Status: CLOSED UPSTREAM QA Contact: Rahul Lepakshi <rlepaksh>
Severity: urgent Docs Contact: Rivka Pollack <rpollack>
Priority: unspecified    
Version: 8.0CC: adking, bhkaur, cephqe-warriors, rpollack
Target Milestone: ---Flags: rlepaksh: needinfo-
Target Release: 8.0z4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
.Cephadm does not emit health warnings when an active NVMe-oF daemon is stopped Currently, Cephadm does not consider stopped daemons when generating health warnings. Health warnings are only triggered for daemons in a failed state. As a result, Cephadm does not emit a health warning if one or more active NVMe-oF daemons are stopped. As a workaround, use the ceph orch ps --daemon-type nvmeof command to check the state of all NVMe-oF daemons. Check the values in the REFRESHED column of the output, which shows how long ago Cephadm last checked the state of the daemons. To refresh the information, use the ceph orch ps --refresh command. By default, the information is refreshed every 10 minutes. You can adjust this refresh rate by modifying the mgr/cephadm/daemon_cache_timeout value in seconds. For example, to set the refresh rate to every 5 minutes, use the ceph config set mgr mgr/cephadm/daemon_cache_timeout 300 command.
Story Points: ---
Clone Of: Environment:
Last Closed: 2026-03-04 08:55:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2317218    

Description Rahul Lepakshi 2024-10-29 06:44:22 UTC
Description of problem:
Currently, health is HEALTH_OK on ceph -s though 8/32 nvmeof GWs are stopped/ error as below. We need to move it to HEALTH_WARN listing the reason as particular daemon is down. This has to be backported to 7.1 as well which is important. I remember having a BZ for this but I cannot find it now  

[ceph: root@tala001 /]# ceph -s
  cluster:
    id:     2573aca6-908d-11ef-ab6d-b4835101e4e4
    health: HEALTH_OK

  services:
    mon:    3 daemons, quorum tala001,tala002,tala003 (age 5d)
    mgr:    tala001.slxshz(active, since 3d), standbys: tala002.giruyl
    osd:    131 osds: 131 up (since 3d), 131 in (since 3d)
    nvmeof: 26 gateways active (26 hosts)

  data:
    pools:   7 pools, 481 pgs
    objects: 3.01M objects, 11 TiB
    usage:   35 TiB used, 333 TiB / 368 TiB avail
    pgs:     481 active+clean

  io:
    client:   5.8 MiB/s rd, 38 op/s rd, 0 op/s wr

[ceph: root@tala001 /]# ceph orch ps| grep nvmeof
nvmeof.nvmeof_pool.group2.tala011.lupkkr                     tala011                     *:5500,4420,8009  running (3d)      9s ago   3d    1482M        -                   d1890c2c521e  6e36b73af16e
nvmeof.nvmeof_pool.group2.tala012.grpqng                     tala012                     *:5500,4420,8009  running (3d)      9s ago   3d    1462M        -                   d1890c2c521e  de5d01ef2a4c
nvmeof.nvmeof_pool.group2.tala013.dalrcy                     tala013                     *:5500,4420,8009  running (3d)      9s ago   3d    1480M        -                   d1890c2c521e  acf7355ec64c
nvmeof.nvmeof_pool.group2.tala014.xopceq                     tala014                     *:5500,4420,8009  running (3d)      9s ago   3d    1469M        -                   d1890c2c521e  4123ab99d1e0
nvmeof.nvmeof_pool.group2.tala018.vhhxuf                     tala018                     *:5500,4420,8009  running (3d)      9s ago   3d    1484M        -                   d1890c2c521e  265c31cdf6c7
nvmeof.nvmeof_pool.group2.tala019.jlcpwu                     tala019                     *:5500,4420,8009  running (3d)      9s ago   3d    1462M        -                   d1890c2c521e  2e27c518d290
nvmeof.nvmeof_pool.group2.tala021.kzyoyi                     tala021                     *:5500,4420,8009  running (3d)      9s ago   3d    1471M        -                   d1890c2c521e  95ba5472c883
nvmeof.nvmeof_pool.group2.tala022.uttuut                     tala022                     *:5500,4420,8009  running (3d)      9s ago   3d    1486M        -                   d1890c2c521e  45735a968ec0
nvmeof.nvmeof_pool.group3.ceph-scale-2-py5fg8-node1.lkgbtj   ceph-scale-2-py5fg8-node1   *:5500,4420,8009  stopped           7s ago   3d        -        -  <unknown>        <unknown>     <unknown>
nvmeof.nvmeof_pool.group3.ceph-scale-2-py5fg8-node2.ijwedf   ceph-scale-2-py5fg8-node2   *:5500,4420,8009  stopped           7s ago   3d        -        -  <unknown>        <unknown>     <unknown>
nvmeof.nvmeof_pool.group3.ceph-scale-2-py5fg8-node3.obuxzq   ceph-scale-2-py5fg8-node3   *:5500,4420,8009  stopped           7s ago   3d        -        -  <unknown>        <unknown>     <unknown>
nvmeof.nvmeof_pool.group3.ceph-scale-2-py5fg8-node4.pzuoql   ceph-scale-2-py5fg8-node4   *:5500,4420,8009  stopped           7s ago   3d        -        -  <unknown>        <unknown>     <unknown>
nvmeof.nvmeof_pool.group3.tala023.pdxpap                     tala023                     *:5500,4420,8009  running (3d)      8s ago   3d    1669M        -                   d1890c2c521e  88b70bd5c796
nvmeof.nvmeof_pool.group3.tala024.udqraj                     tala024                     *:5500,4420,8009  running (3d)      8s ago   3d    1718M        -                   d1890c2c521e  362492a39537
nvmeof.nvmeof_pool.group3.tala025.wwbssy                     tala025                     *:5500,4420,8009  running (3d)      7s ago   3d    1742M        -                   d1890c2c521e  98918c880d23
nvmeof.nvmeof_pool.group3.tala026.ydnfgn                     tala026                     *:5500,4420,8009  running (3d)      7s ago   3d    1734M        -                   d1890c2c521e  7fea5c629401
nvmeof.nvmeof_pool.group4.ceph-scale-2-py5fg8-node5.cbyzpa   ceph-scale-2-py5fg8-node5   *:5500,4420,8009  running (3d)      4s ago   3d    1495M        -                   d1890c2c521e  f1f3bbae0c32
nvmeof.nvmeof_pool.group4.ceph-scale-2-py5fg8-node6.gmkwyd   ceph-scale-2-py5fg8-node6   *:5500,4420,8009  running (3d)      5s ago   3d    1465M        -                   d1890c2c521e  222c64422150
nvmeof.nvmeof_pool.group4.ceph-scale-2-py5fg8-node7.gdynqu   ceph-scale-2-py5fg8-node7   *:5500,4420,8009  running (3d)      6s ago   3d    1469M        -                   d1890c2c521e  c11af10aeac3
nvmeof.nvmeof_pool.group4.ceph-scale-2-py5fg8-node8.qdtuns   ceph-scale-2-py5fg8-node8   *:5500,4420,8009  running (3d)      6s ago   3d    1464M        -                   d1890c2c521e  c757eed19686
nvmeof.nvmeof_pool.group4.ceph-scale-2-py5fg8-node9.kyeeop   ceph-scale-2-py5fg8-node9   *:5500,4420,8009  running (3d)      6s ago   3d    1476M        -                   d1890c2c521e  20be8febd0bd
nvmeof.nvmeof_pool.group4.ceph-scale-2-py5fg8-node10.kdruzm  ceph-scale-2-py5fg8-node10  *:5500,4420,8009  running (3d)      6s ago   3d    1511M        -                   d1890c2c521e  7efc572dfa78
nvmeof.nvmeof_pool.group4.ceph-scale-2-py5fg8-node11.ivcdsc  ceph-scale-2-py5fg8-node11  *:5500,4420,8009  running (3d)      7s ago   3d    1501M        -                   d1890c2c521e  10a8414e7222
nvmeof.nvmeof_pool.group4.ceph-scale-2-py5fg8-node12.oxmyin  ceph-scale-2-py5fg8-node12  *:5500,4420,8009  running (3d)      7s ago   3d    1478M        -                   d1890c2c521e  0ac9c1634045
nvmeof.nvmeof_pool.tala003.fjsvct                            tala003                     *:5500,4420,8009  running (24h)     9m ago   5d     647M        -                   d1890c2c521e  bbd323193423
nvmeof.nvmeof_pool.tala004.mxjpxs                            tala004                     *:5500,4420,8009  running (24h)     9m ago   5d     643M        -                   d1890c2c521e  a5bb5f6b1fbc
nvmeof.nvmeof_pool.tala005.kgsiek                            tala005                     *:5500,4420,8009  running (24h)     9m ago   5d     651M        -                   d1890c2c521e  80b16963c41f
nvmeof.nvmeof_pool.tala006.frwxtb                            tala006                     *:5500,4420,8009  running (24h)     9m ago   5d     647M        -                   d1890c2c521e  aa07f514cf10
nvmeof.nvmeof_pool.tala007.rlqngl                            tala007                     *:5500,4420,8009  stopped           9m ago   3d        -        -  <unknown>        <unknown>     <unknown>
nvmeof.nvmeof_pool.tala008.ohvzfm                            tala008                     *:5500,4420,8009  stopped           9m ago   3d        -        -  <unknown>        <unknown>     <unknown>
nvmeof.nvmeof_pool.tala009.yivhzu                            tala009                     *:5500,4420,8009  running (24h)     9m ago   3d     644M        -                   d1890c2c521e  4a6e52fc832a
nvmeof.nvmeof_pool.tala010.tkqkmb                            tala010                     *:5500,4420,8009  running (24h)     9m ago   3d     650M        -                   d1890c2c521e  25a1413a1fb4
[ceph: root@tala001 /]# ceph orch ls | grep nvmeof
nvmeof.nvmeof_pool         ?:4420,5500,8009      6/8  10m ago    17h  tala003;tala004;tala005;tala006;tala007;tala008;tala009;tala010                                                                                                     
nvmeof.nvmeof_pool.group2  ?:4420,5500,8009      8/8  17s ago    17h  tala011;tala012;tala013;tala014;tala018;tala019;tala021;tala022                                                                                                     
nvmeof.nvmeof_pool.group3  ?:4420,5500,8009      4/8  17s ago    17h  tala023;tala024;tala025;tala026;ceph-scale-2-py5fg8-node1;ceph-scale-2-py5fg8-node2;ceph-scale-2-py5fg8-node3;ceph-scale-2-py5fg8-node4                             
nvmeof.nvmeof_pool.group4  ?:4420,5500,8009      8/8  16s ago    17h  ceph-scale-2-py5fg8-node12;ceph-scale-2-py5fg8-node11;ceph-scale-2-py5fg8-node10;ceph-scale-2-py5fg8-node9;ceph-scale-2-py5fg8-node8;ceph-scale-2-py5fg8-node7;ceph-scale-2-py5fg8-node6;ceph-scale-2-py5fg8-node5



Version-Release number of selected component (if applicable):

[ceph: root@tala001 /]# ceph version
ceph version 19.2.0-39.el9cp (ade19941ff2892c8fef06386a713d71e27e93a2c) squid (stable)


How reproducible: Always


Steps to Reproduce:
1. Deploy ceph cluster at reef or squid and deploy nvmeof service
2. Bring down few nvmeof daemons and ceph health does not complain does not complain about it 
3.

Actual results: Bringing down few nvmeof daemons and ceph health does not complain about it and move remains at HEALTH_OK


Expected results: Bringing down few nvmeof daemons and ceph health should complain  about it and move to HEALTH_WARN


Additional info:

Comment 7 Red Hat Bugzilla 2026-03-04 08:55:27 UTC
This product has been discontinued or is no longer tracked in Red Hat Bugzilla.