Description of problem: Ceph cluster goes to error state after performing multiple removal and deployment of ISCSI - health: HEALTH_ERR Module 'cephadm' has failed: dashboard iscsi-gateway-rm failed: iSCSI gateway 'ceph-pnataraj-7ypsv7-node3' does not exist retval: -2 Version-Release number of selected component (if applicable): [ceph: root@ceph-pnataraj-7ypsv7-node1-installer /]# ceph version ceph version 16.2.0-79.el8cp (63c5c96018da6d39383c8f5ae534a0d1523fc274) pacific (stable) How reproducible: Steps to Reproduce: 1. Deploy 5.0 cluster with mgr,mon, osd services 2. Create pool and deploy ISCSI with 4 gateways 3. Check "Ceph orch ls" for service status 4. Perform Removal of service and deploy ISCSI for 2-3 times 5. Check the cluster health and Ceph orch ls I copied the keyring, conf and cephadm to primary gateway and perform removal of ISCSIfrom gateway node and vice versa from bootstarp node Actual results: Seeing the below error in "Ceph orch ls" [root@ceph-pnataraj-7ypsv7-node1-installer cephuser]# cephadm shell Inferring fsid f64f341c-655d-11eb-8778-fa163e914bcc Inferring config /var/lib/ceph/f64f341c-655d-11eb-8778-fa163e914bcc/mon.ceph-pnataraj-7ypsv7-node1-installer/config Using recent ceph image registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:b37e99428d2304e11982d192a2a948526dd19c2196685dce656f205f3400de27 [ceph: root@ceph-pnataraj-7ypsv7-node1-installer /]# ceph status cluster: id: f64f341c-655d-11eb-8778-fa163e914bcc health: HEALTH_ERR Module 'cephadm' has failed: dashboard iscsi-gateway-rm failed: iSCSI gateway 'ceph-pnataraj-7ypsv7-node3' does not exist retval: -2 services: mon: 3 daemons, quorum ceph-pnataraj-7ypsv7-node1-installer,ceph-pnataraj-7ypsv7-node6,ceph-pnataraj-7ypsv7-node2 (age 4h) mgr: ceph-pnataraj-7ypsv7-node1-installer.jxhifn(active, since 6d), standbys: ceph-pnataraj-7ypsv7-node2.gzykir osd: 12 osds: 12 up (since 6d), 12 in (since 6d) rgw: 2 daemons active (2 hosts, 1 zones) data: pools: 9 pools, 233 pgs objects: 631 objects, 1.5 GiB usage: 13 GiB used, 167 GiB / 180 GiB avail pgs: 233 active+clean io: client: 2.5 KiB/s rd, 2 op/s rd, 0 op/s wr [ceph: root@ceph-pnataraj-7ypsv7-node1-installer /]# ceph orch ls NAME RUNNING REFRESHED AGE PLACEMENT alertmanager 2/2 19h ago 6d ceph-pnataraj-7ypsv7-node1-installer;ceph-pnataraj-7ypsv7-node2 grafana 1/1 19h ago 6d ceph-pnataraj-7ypsv7-node1-installer iscsi.iscsi 3/4 <deleting> 19h ceph-pnataraj-7ypsv7-node3;ceph-pnataraj-7ypsv7-node4;ceph-pnataraj-7ypsv7-node5;ceph-pnataraj-7ypsv7-node8 mgr 2/2 19h ago 22h ceph-pnataraj-7ypsv7-node1-installer;ceph-pnataraj-7ypsv7-node2;count:2 mon 3/3 19h ago 6d label:mon node-exporter 8/8 19h ago 6d * osd.all-available-devices 12/20 19h ago 6d * prometheus 1/1 19h ago 6d ceph-pnataraj-7ypsv7-node1-installer rgw.foo 2/2 19h ago 6d ceph-pnataraj-7ypsv7-node6;ceph-pnataraj-7ypsv7-node7;count:2 [ceph: root@ceph-pnataraj-7ypsv7-node1-installer /]# ceph version ceph version 16.2.0-79.el8cp (63c5c96018da6d39383c8f5ae534a0d1523fc274) pacific (stable) [ceph: root@ceph-pnataraj-7ypsv7-node1-installer /]# Node details: 10.0.209.88 cephuser@cephuser root@q No errors were noticed in mgr logs NOTE:We also noticed the below error while adding the ISCSI gatewate hence, did removal of service and redeploy of ISCSI because of the below issue and therefore we encountered the above issue /iscsi-target...-igw/gateways> create ceph-gw-1 10.0.210.8 The first gateway defined must be the local machine /iscsi-target...-igw/gateways> create ceph-gw-1 10.0.209.227 The first gateway defined must be the local machine /iscsi-target...-igw/gateways> Expected results: Cluster should not enter to error state irrespective of multiple times of removal and deployment Additional info: ISCSI spec file for reference: service_type: iscsi service_id: iscsi placement: hosts: - ceph-pnataraj-7ypsv7-node3 - ceph-pnataraj-7ypsv7-node4 - ceph-pnataraj-7ypsv7-node5 - ceph-pnataraj-7ypsv7-node8 spec: pool: iscsi trusted_ip_list: "10.0.210.8,10.0.209.227,10.0.210.191,10.0.211.111"
http://pastebin.test.redhat.com/996372 - mgr logs snippet.
upstream tracker: https://tracker.ceph.com/issues/52866 upstream master pr: https://github.com/ceph/ceph/pull/43454
Issue is still seen with the latest ceph version ceph version 16.2.7-9.el8cp [ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph orch apply iscsi test1 --placement="ceph-ci-lfir5-kmnh9c-node1-installer,ceph-ci-lfir5-kmnh9c-node5" --trusted_ip_list="10.0.211.165,10.0.209.32" admin admin Scheduled iscsi.test1 update... [ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph orch ls NAME PORTS RUNNING REFRESHED AGE PLACEMENT iscsi.test1 0/2 - 5s ceph-ci-lfir5-kmnh9c-node1-installer;ceph-ci-lfir5-kmnh9c-node5 mgr 1/1 108s ago 1h ceph-ci-lfir5-kmnh9c-node1-installer mon 3/3 108s ago 1h label:mon osd.all-available-devices 16 108s ago 1h * [ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph orch ls NAME PORTS RUNNING REFRESHED AGE PLACEMENT iscsi.test1 2/2 - 10s ceph-ci-lfir5-kmnh9c-node1-installer;ceph-ci-lfir5-kmnh9c-node5 mgr 1/1 112s ago 1h ceph-ci-lfir5-kmnh9c-node1-installer mon 3/3 112s ago 1h label:mon osd.all-available-devices 16 112s ago 1h * [ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph orch apply iscsi test1 --placement="ceph-ci-lfir5-kmnh9c-node1-installer,ceph-ci-lfir5-kmnh9c-node5" --trusted_ip_list="10.0.211.165,10.0.209.32" admin admin^C [ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph orch rm iscsi.test1 Removed service iscsi.test1 [ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph orch ls NAME PORTS RUNNING REFRESHED AGE PLACEMENT mgr 1/1 21s ago 1h ceph-ci-lfir5-kmnh9c-node1-installer mon 3/3 32s ago 1h label:mon osd.all-available-devices 16 73s ago 1h * [ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph orch apply iscsi test1 --placement="ceph-ci-lfir5-kmnh9c-node1-installer,ceph-ci-lfir5-kmnh9c-node5" --trusted_ip_list="10.0.211.165,10.0.209.32" admin admin Scheduled iscsi.test1 update... [ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph orch rm iscsi.test1 Removed service iscsi.test1 [ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph orch ls NAME PORTS RUNNING REFRESHED AGE PLACEMENT iscsi.test1 1/2 <deleting> 17s ceph-ci-lfir5-kmnh9c-node1-installer;ceph-ci-lfir5-kmnh9c-node5 mgr 1/1 8s ago 1h ceph-ci-lfir5-kmnh9c-node1-installer mon 3/3 56s ago 1h label:mon osd.all-available-devices 16 97s ago 1h * [ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph orch ls NAME PORTS RUNNING REFRESHED AGE PLACEMENT iscsi.test1 1/2 <deleting> 23s ceph-ci-lfir5-kmnh9c-node1-installer;ceph-ci-lfir5-kmnh9c-node5 mgr 1/1 15s ago 1h ceph-ci-lfir5-kmnh9c-node1-installer mon 3/3 62s ago 1h label:mon osd.all-available-devices 16 104s ago 1h * [ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph orch ls NAME PORTS RUNNING REFRESHED AGE PLACEMENT iscsi.test1 1/2 <deleting> 25s ceph-ci-lfir5-kmnh9c-node1-installer;ceph-ci-lfir5-kmnh9c-node5 mgr 1/1 16s ago 1h ceph-ci-lfir5-kmnh9c-node1-installer mon 3/3 64s ago 1h label:mon osd.all-available-devices 16 106s ago 1h * [ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph orch ls NAME PORTS RUNNING REFRESHED AGE PLACEMENT iscsi.test1 1/2 <deleting> 27s ceph-ci-lfir5-kmnh9c-node1-installer;ceph-ci-lfir5-kmnh9c-node5 mgr 1/1 18s ago 1h ceph-ci-lfir5-kmnh9c-node1-installer mon 3/3 66s ago 1h label:mon osd.all-available-devices 16 107s ago 1h * [ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# [ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# [ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph orch ls NAME PORTS RUNNING REFRESHED AGE PLACEMENT iscsi.test1 1/2 <deleting> 48s ceph-ci-lfir5-kmnh9c-node1-installer;ceph-ci-lfir5-kmnh9c-node5 mgr 1/1 40s ago 1h ceph-ci-lfir5-kmnh9c-node1-installer mon 3/3 87s ago 1h label:mon osd.all-available-devices 16 2m ago 1h * [ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph orch ls NAME PORTS RUNNING REFRESHED AGE PLACEMENT iscsi.test1 1/2 <deleting> 56s ceph-ci-lfir5-kmnh9c-node1-installer;ceph-ci-lfir5-kmnh9c-node5 mgr 1/1 48s ago 1h ceph-ci-lfir5-kmnh9c-node1-installer mon 3/3 95s ago 1h label:mon osd.all-available-devices 16 2m ago 1h * [ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph status cluster: id: f64f341c-655d-11eb-8778-fa163e914bcc health: HEALTH_ERR Module 'cephadm' has failed: dashboard iscsi-gateway-rm failed: iSCSI gateway 'ceph-ci-lfir5-kmnh9c-node1-installer' does not exist retval: -2 services: mon: 3 daemons, quorum ceph-ci-lfir5-kmnh9c-node1-installer,ceph-ci-lfir5-kmnh9c-node2,ceph-ci-lfir5-kmnh9c-node6 (age 25h) mgr: ceph-ci-lfir5-kmnh9c-node1-installer.pnbxql(active, since 25h) osd: 16 osds: 16 up (since 25h), 16 in (since 25h) data: pools: 8 pools, 201 pgs objects: 204 objects, 6.0 KiB usage: 590 MiB used, 239 GiB / 240 GiB avail pgs: 201 active+clean io: client: 852 B/s rd, 0 op/s rd, 0 op/s wr [ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]#
This issue occurs if the iscsi service is removed before the iscsi gateway list is updated with the deployed daemons (i.e. error occurs if `ceph dashboard iscsi-gateway-list` is empty when `ceph orch rm iscsi.iscsi` is run). If enough time has passed so that the gateway list is populated, no error will occur when it's removed. This is the upstream PR for this: https://github.com/ceph/ceph/pull/44549
Working as expected. No errors seen after multiple removal and deployment of ISCSI. Verified with latest ceph version [ceph: root@magna031 /]# ceph orch ls NAME PORTS RUNNING REFRESHED AGE PLACEMENT alertmanager ?:9093,9094 1/1 18s ago 5M count:1 crash 9/9 6m ago 5M * grafana ?:3000 1/1 18s ago 5M count:1 iscsi.iscsipool 2/2 - 10s magna031;magna006 mds.remote 2/2 5m ago 11d depressa004.ceph.redhat.com;depressa005.ceph.redhat.com;count:2 mgr 3/3 5m ago 5M magna031;magna032;magna006;count:3 mon 3/3 5m ago 5M magna031;magna032;magna006;count:3 node-exporter ?:9100 9/9 6m ago 5M * osd 15 6m ago - <unmanaged> osd.osd_with_nvme 12 5m ago 4M depressa00[4-6].ceph.redhat.com prometheus ?:9095 1/1 18s ago 5M count:1 rbd-mirror 1/1 18s ago 5M magna031 rgw.foo ?:80 2/2 5m ago 4M count:2 [ceph: root@magna031 /]# ceph status cluster: id: d6e5c458-0f10-11ec-9663-002590fc25a4 health: HEALTH_OK services: mon: 3 daemons, quorum magna031,magna032,magna006 (age 93m) mgr: magna006.vxieja(active, since 94m), standbys: magna031.xqwypm, magna032.lzjsxg mds: 1/1 daemons up, 1 standby osd: 27 osds: 27 up (since 82m), 27 in (since 4M) rbd-mirror: 1 daemon active (1 hosts) rgw: 2 daemons active (2 hosts, 1 zones) data: volumes: 1/1 healthy pools: 30 pools, 913 pgs objects: 151.67k objects, 574 GiB usage: 2.9 TiB used, 105 TiB / 108 TiB avail pgs: 913 active+clean io: client: 511 B/s rd, 85 B/s wr, 0 op/s rd, 0 op/s wr [ceph: root@magna031 /]# [ceph: root@magna031 /]# ceph version ceph version 16.2.7-67.el8cp (2ff107c73e8642c55c83296928b5102b785ff4e2) pacific (stable) [ceph: root@magna031 /]#
*** Bug 2049006 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage 5.1 Security, Enhancement, and Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1174
*** Bug 2034789 has been marked as a duplicate of this bug. ***