Description of problem: If the iscsi service is removed and the dashboard isn't deployed (dashboard mgr module not enabled) then the cluster status goes to HEALTH_ERR and the removal is stuck is deleting state. Version-Release number of selected component (if applicable): # ceph --version ceph version 16.2.0-98.el8cp (9c6352ff5276f8fb2029981206f3516707220054) pacific (stable) # rpm -qa cephadm cephadm-16.2.0-98.el8cp.noarch How reproducible: 100% Steps to Reproduce: 1. bootstrap a cluster without dashboard : cephadm bootstrap --mon-ip x.x.x.x --skip-dashboard --skip-monitoring-stack 2. add some OSDs 3. deploy the iscsi service 4. remove iscsi with : ceph orch rm iscsi.iscsi Actual results: # ceph orch ls --service_type iscsi NAME RUNNING REFRESHED AGE PLACEMENT iscsi.iscsi 0/1 <deleting> 7m cephaio # ceph health detail HEALTH_ERR Module 'cephadm' has failed: dashboard iscsi-gateway-rm failed: Module 'dashboard' is not enabled (required by command 'dashboard iscsi-gateway-rm'): use `ceph mgr module enable dashboard` to enable it retval: -95 [ERR] MGR_MODULE_ERROR: Module 'cephadm' has failed: dashboard iscsi-gateway-rm failed: Module 'dashboard' is not enabled (required by command 'dashboard iscsi-gateway-rm'): use `ceph mgr module enable dashboard` to enable it retval: -95 Module 'cephadm' has failed: dashboard iscsi-gateway-rm failed: Module 'dashboard' is not enabled (required by command 'dashboard iscsi-gateway-rm'): use `ceph mgr module enable dashboard` to enable it retval: -95 Expected results: The iscsi service should be removed correctly and the cluster status should be HEALTH_OK Additional info: This is a regression introduced by [1] which added the `ceph dashboard iscsi-gateway-rm xxx` command during the service removal but this doesn't check if the dashboard module is enabled first. This change is present upstream since v16.2.5 but has been cherry-picked downstream. # ceph mgr module ls --format json | python3 -c 'import sys, json; print(json.load(sys.stdin)["enabled_modules"])' ['cephadm', 'iostat', 'restful'] [1] https://github.com/ceph/ceph/commit/1b9e3edcfd1c1a3dd02d6eb14072494f57b086a8
issue is not seen with the latest ceph version. [ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph orch ls NAME PORTS RUNNING REFRESHED AGE PLACEMENT iscsi.test1 2/2 - 7s ceph-ci-lfir5-kmnh9c-node1-installer;ceph-ci-lfir5-kmnh9c-node5 mgr 1/1 2m ago 0h ceph-ci-lfir5-kmnh9c-node1-installer mon 3/3 8m ago 1h label:mon osd.all-available-devices 16 9m ago 0h * [ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph orch rm iscsi.test1 Removed service iscsi.test1 [ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph status cluster: id: f64f341c-655d-11eb-8778-fa163e914bcc health: HEALTH_OK services: mon: 3 daemons, quorum ceph-ci-lfir5-kmnh9c-node1-installer,ceph-ci-lfir5-kmnh9c-node2,ceph-ci-lfir5-kmnh9c-node6 (age 25h) mgr: ceph-ci-lfir5-kmnh9c-node1-installer.pnbxql(active, since 24h) osd: 16 osds: 16 up (since 24h), 16 in (since 24h) data: pools: 8 pools, 201 pgs objects: 204 objects, 6.0 KiB usage: 590 MiB used, 239 GiB / 240 GiB avail pgs: 201 active+clean io: client: 2.3 KiB/s rd, 2 op/s rd, 0 op/s wr [ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph orch ls NAME PORTS RUNNING REFRESHED AGE PLACEMENT mgr 1/1 5s ago 1h ceph-ci-lfir5-kmnh9c-node1-installer mon 3/3 9m ago 1h label:mon osd.all-available-devices 16 10m ago 0h * [ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph version ceph version 16.2.7-9.el8cp (ecbd003fb25bd255d778de612f289b7ac9db7f27) pacific (stable) [ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage 5.1 Security, Enhancement, and Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1174