Description of problem: [Host Maintenance][NVMe] - When we try to place host containing the last available NVMe gateway to maintenance mode, it allows to do so even without a warning that there may be IO interruption Last available NVMe gateway node should not be allowed to be placed into maintenance mode unless user is first given a warning to use --force option. Version-Release number of selected component (if applicable): cp.stg.icr.io/cp/ibm-ceph/ceph-7-rhel9:7-56 How reproducible: Always Steps to Reproduce: 1.Create an RHCS 7.1 cluster with 4 NVMe gateway nodes. 2.Place 3 of them into maintenance mode and we see that the remaining 4th node becomes active and picks up all the IOs for all disks. [root@ceph-ibm-upgrade-hn699h-node11 ~]# ceph orch host ls HOST ADDR LABELS STATUS ceph-ibm-upgrade-hn699h-node1-installer 10.0.208.144 _admin,mgr,mon,installer ceph-ibm-upgrade-hn699h-node2 10.0.209.108 mgr,mon ceph-ibm-upgrade-hn699h-node3 10.0.208.135 mon,osd ceph-ibm-upgrade-hn699h-node4 10.0.208.32 osd,mds ceph-ibm-upgrade-hn699h-node5 10.0.208.105 osd,mds ceph-ibm-upgrade-hn699h-node6 10.0.211.178 osd ceph-ibm-upgrade-hn699h-node7 10.0.210.217 nvmeof-gw Maintenance ceph-ibm-upgrade-hn699h-node8 10.0.211.22 nvmeof-gw Maintenance ceph-ibm-upgrade-hn699h-node9 10.0.209.67 nvmeof-gw Maintenance ceph-ibm-upgrade-hn699h-node10 10.0.208.252 nvmeof-gw 10 hosts in cluster [root@ceph-ibm-upgrade-hn699h-node11 ~]# ceph nvme-gw show nvmeof '' { "epoch": 56, "pool": "nvmeof", "group": "", "num gws": 4, "Anagrp list": "[ 1 2 3 4 ]" } { "gw-id": "client.nvmeof.nvmeof.ceph-ibm-upgrade-hn699h-node10.tisxus", "anagrp-id": 1, "performed-full-startup": 1, "Availability": "AVAILABLE", "ana states": " 1: ACTIVE , 2: ACTIVE , 3: ACTIVE , 4: ACTIVE ," } { "gw-id": "client.nvmeof.nvmeof.ceph-ibm-upgrade-hn699h-node7.iamilt", "anagrp-id": 2, "performed-full-startup": 0, "Availability": "UNAVAILABLE", "ana states": " 1: STANDBY , 2: STANDBY , 3: STANDBY , 4: STANDBY ," } { "gw-id": "client.nvmeof.nvmeof.ceph-ibm-upgrade-hn699h-node8.tnjcij", "anagrp-id": 3, "performed-full-startup": 0, "Availability": "UNAVAILABLE", "ana states": " 1: STANDBY , 2: STANDBY , 3: STANDBY , 4: STANDBY ," } { "gw-id": "client.nvmeof.nvmeof.ceph-ibm-upgrade-hn699h-node9.mqoxhu", "anagrp-id": 4, "performed-full-startup": 0, "Availability": "UNAVAILABLE", "ana states": " 1: STANDBY , 2: STANDBY , 3: STANDBY , 4: STANDBY ," } 3.Try placing the last available NVMe gateway node into maintenance, we see that it is allowed without so much as a warning. [root@ceph-ibm-upgrade-hn699h-node11 ~]# ceph orch host maintenance enter ceph-ibm-upgrade-hn699h-node10 Daemons for Ceph cluster 9dbd3814-1d7c-11ef-8e61-fa163e083545 stopped on host ceph-ibm-upgrade-hn699h-node10. Host ceph-ibm-upgrade-hn699h-node10 moved to maintenance mode This then leads to IO interruption from the client. [root@ceph-ibm-upgrade-hn699h-node11 ~]# ceph orch host ls HOST ADDR LABELS STATUS ceph-ibm-upgrade-hn699h-node1-installer 10.0.208.144 _admin,mgr,mon,installer ceph-ibm-upgrade-hn699h-node2 10.0.209.108 mgr,mon ceph-ibm-upgrade-hn699h-node3 10.0.208.135 mon,osd ceph-ibm-upgrade-hn699h-node4 10.0.208.32 osd,mds ceph-ibm-upgrade-hn699h-node5 10.0.208.105 osd,mds ceph-ibm-upgrade-hn699h-node6 10.0.211.178 osd ceph-ibm-upgrade-hn699h-node7 10.0.210.217 nvmeof-gw Maintenance ceph-ibm-upgrade-hn699h-node8 10.0.211.22 nvmeof-gw Maintenance ceph-ibm-upgrade-hn699h-node9 10.0.209.67 nvmeof-gw Maintenance ceph-ibm-upgrade-hn699h-node10 10.0.208.252 nvmeof-gw Maintenance 10 hosts in cluster [root@ceph-ibm-upgrade-hn699h-node11 ~]# ceph nvme-gw show nvmeof '' { "epoch": 57, "pool": "nvmeof", "group": "", "num gws": 4, "Anagrp list": "[ 1 2 3 4 ]" } { "gw-id": "client.nvmeof.nvmeof.ceph-ibm-upgrade-hn699h-node10.tisxus", "anagrp-id": 1, "performed-full-startup": 0, "Availability": "UNAVAILABLE", "ana states": " 1: STANDBY , 2: STANDBY , 3: STANDBY , 4: STANDBY ," } { "gw-id": "client.nvmeof.nvmeof.ceph-ibm-upgrade-hn699h-node7.iamilt", "anagrp-id": 2, "performed-full-startup": 0, "Availability": "UNAVAILABLE", "ana states": " 1: STANDBY , 2: STANDBY , 3: STANDBY , 4: STANDBY ," } { "gw-id": "client.nvmeof.nvmeof.ceph-ibm-upgrade-hn699h-node8.tnjcij", "anagrp-id": 3, "performed-full-startup": 0, "Availability": "UNAVAILABLE", "ana states": " 1: STANDBY , 2: STANDBY , 3: STANDBY , 4: STANDBY ," } { "gw-id": "client.nvmeof.nvmeof.ceph-ibm-upgrade-hn699h-node9.mqoxhu", "anagrp-id": 4, "performed-full-startup": 0, "Availability": "UNAVAILABLE", "ana states": " 1: STANDBY , 2: STANDBY , 3: STANDBY , 4: STANDBY ," } Actual results: All hosts containing NVMe gateways are placed into maintenance mode without warning that there may be IO interruption. Expected results: the user should atleast be given a warning and asked to pass --force parameter if he wishes to move the last remaining gateway to maintenance mode. Additional info:
I think this should be fixed for 7.1z1.