Description of problem: Upgraded the setup from 4.x to 5.x and the upgrade was successful and iscsi daemons started properly. When converting the storage cluster daemons to run cephadm using "ansible-playbook infrastructure-playbooks/cephadm-adopt.yml -i hosts", iscsi daemons failed to start until we start manually. Version-Release number of selected component (if applicable): ceph version 16.2.0-117.el8cp (0e34bb74700060ebfaa22d99b7d2cdc037b28a57) pacific (stable) How reproducible: 100% Steps to Reproduce: 1. Install 4.x build on fresh cluster 2. Configure iscsi configuration on the cluster 3. Upgrade to 5.x from 4.x and check the iscsi configuration. 4. Convert the storage cluster daemons to run cephadm. 5. Check the iscsi configuration. Actual results: No tcmu-runner portals active on cluster: cluster: id: 7c80e5b9-c8ae-4fbb-b9e5-36b4288182d0 health: HEALTH_WARN mons are allowing insecure global_id reclaim insufficient standby MDS daemons available 1 pools have too many placement groups services: mon: 3 daemons, quorum ceph-gp-rbd-fqj2nr-node2,ceph-gp-rbd-fqj2nr-node3,ceph-gp-rbd-fqj2nr-node1-installer (age 4m) mgr: ceph-gp-rbd-fqj2nr-node1-installer(active, since 26m), standbys: ceph-gp-rbd-fqj2nr-node2 mds: 1/1 daemons up osd: 12 osds: 12 up (since 3m), 12 in (since 19h) data: volumes: 1/1 healthy pools: 4 pools, 193 pgs objects: 1.09k objects, 4.1 GiB usage: 12 GiB used, 167 GiB / 180 GiB avail pgs: 193 active+clean Expected results: tcmu-runner portals should be shown as below: cluster: id: 7c80e5b9-c8ae-4fbb-b9e5-36b4288182d0 health: HEALTH_WARN mons are allowing insecure global_id reclaim insufficient standby MDS daemons available 1 pools have too many placement groups services: mon: 3 daemons, quorum ceph-gp-rbd-fqj2nr-node2,ceph-gp-rbd-fqj2nr-node3,ceph-gp-rbd-fqj2nr-node1-installer (age 51s) mgr: ceph-gp-rbd-fqj2nr-node1-installer(active, since 23m), standbys: ceph-gp-rbd-fqj2nr-node2 mds: 1/1 daemons up osd: 12 osds: 12 up (since 45s), 12 in (since 19h) tcmu-runner: 1 daemon active (1 hosts) data: volumes: 1/1 healthy pools: 4 pools, 193 pgs objects: 1.09k objects, 4.1 GiB usage: 12 GiB used, 167 GiB / 180 GiB avail pgs: 193 active+clean Additional info: I restarted API services manually, then i can see portals and all. On client side before converting storage daaemons using cephadm: --------------- [root@ceph-gp-rbd-fqj2nr-node7 ~]# multipath -ll 3600140580c6fc634c3449dfb4f6264b4 dm-0 LIO-ORG,TCMU device size=50G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw |-+- policy='queue-length 0' prio=50 status=enabled | `- 2:0:0:0 sda 8:0 active ready running `-+- policy='queue-length 0' prio=10 status=enabled `- 3:0:0:0 sdb 8:16 active ready running [root@ceph-gp-rbd-fqj2nr-node7 ~]# [root@ceph-gp-rbd-fqj2nr-node7 cephuser]# mount /dev/mapper/3600140580c6fc634c3449dfb4f6264b4 /tmp/iscsi_ditr [root@ceph-gp-rbd-fqj2nr-node7 cephuser]# [root@ceph-gp-rbd-fqj2nr-node7 cephuser]# [root@ceph-gp-rbd-fqj2nr-node7 cephuser]# [root@ceph-gp-rbd-fqj2nr-node7 cephuser]# cd /tmp/iscsi_ditr [root@ceph-gp-rbd-fqj2nr-node7 iscsi_ditr]# dd if=/dev/zero of=file1.txt count=1024 bs=1048576 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 3.19474 s, 336 MB/s [root@ceph-gp-rbd-fqj2nr-node7 iscsi_ditr]# dd if=/dev/zero of=file2.txt count=1024 bs=1048576 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 3.96971 s, 270 MB/s [root@ceph-gp-rbd-fqj2nr-node7 iscsi_ditr]# cluster: id: 7c80e5b9-c8ae-4fbb-b9e5-36b4288182d0 health: HEALTH_WARN mons are allowing insecure global_id reclaim insufficient standby MDS daemons available 1 pools have too many placement groups services: mon: 3 daemons, quorum ceph-gp-rbd-fqj2nr-node2,ceph-gp-rbd-fqj2nr-node3,ceph-gp-rbd-fqj2nr-node1-installer (age 12m) mgr: ceph-gp-rbd-fqj2nr-node1-installer(active, since 11m), standbys: ceph-gp-rbd-fqj2nr-node2 mds: 1/1 daemons up osd: 12 osds: 12 up (since 9m), 12 in (since 19h) tcmu-runner: 2 daemons active (2 hosts) Run infrastructure-playbooks/cephadm-adopt.yml: TASK [update the placement of iscsigw hosts] ******************************************************************************************************************************** Wednesday 01 September 2021 22:05:19 -0400 (0:00:00.311) 0:04:36.762 *** ok: [ceph-gp-rbd-fqj2nr-node3 -> ceph-gp-rbd-fqj2nr-node1-installer] PLAY [stop and remove legacy iscsigw daemons] ******************************************************************************************************************************* TASK [stop and disable iscsigw systemd services] **************************************************************************************************************************** Wednesday 01 September 2021 22:05:21 -0400 (0:00:02.119) 0:04:38.881 *** changed: [ceph-gp-rbd-fqj2nr-node3] => (item=rbd-target-api) changed: [ceph-gp-rbd-fqj2nr-node3] => (item=rbd-target-gw) changed: [ceph-gp-rbd-fqj2nr-node3] => (item=tcmu-runner) TASK [reset failed iscsigw systemd units] *********************************************************************************************************************************** Wednesday 01 September 2021 22:05:29 -0400 (0:00:07.798) 0:04:46.679 *** ok: [ceph-gp-rbd-fqj2nr-node3] => (item=rbd-target-api) ok: [ceph-gp-rbd-fqj2nr-node3] => (item=rbd-target-gw) ok: [ceph-gp-rbd-fqj2nr-node3] => (item=tcmu-runner) TASK [remove iscsigw systemd unit files] ************************************************************************************************************************************ Wednesday 01 September 2021 22:05:30 -0400 (0:00:00.769) 0:04:47.448 *** changed: [ceph-gp-rbd-fqj2nr-node3] => (item=rbd-target-api) changed: [ceph-gp-rbd-fqj2nr-node3] => (item=rbd-target-gw) changed: [ceph-gp-rbd-fqj2nr-node3] => (item=tcmu-runner) PLAY [stop and remove legacy iscsigw daemons] ******************************************************************************************************************************* TASK [stop and disable iscsigw systemd services] **************************************************************************************************************************** Wednesday 01 September 2021 22:05:30 -0400 (0:00:00.692) 0:04:48.141 *** changed: [ceph-gp-rbd-fqj2nr-node5] => (item=rbd-target-api) changed: [ceph-gp-rbd-fqj2nr-node5] => (item=rbd-target-gw) changed: [ceph-gp-rbd-fqj2nr-node5] => (item=tcmu-runner) TASK [reset failed iscsigw systemd units] *********************************************************************************************************************************** Wednesday 01 September 2021 22:05:38 -0400 (0:00:08.101) 0:04:56.243 *** ok: [ceph-gp-rbd-fqj2nr-node5] => (item=rbd-target-api) ok: [ceph-gp-rbd-fqj2nr-node5] => (item=rbd-target-gw) ok: [ceph-gp-rbd-fqj2nr-node5] => (item=tcmu-runner) Check the "ceph -s" status: -------------------------- cluster: id: 7c80e5b9-c8ae-4fbb-b9e5-36b4288182d0 health: HEALTH_WARN mons are allowing insecure global_id reclaim insufficient standby MDS daemons available 1 pools have too many placement groups services: mon: 3 daemons, quorum ceph-gp-rbd-fqj2nr-node2,ceph-gp-rbd-fqj2nr-node3,ceph-gp-rbd-fqj2nr-node1-installer (age 4m) mgr: ceph-gp-rbd-fqj2nr-node1-installer(active, since 4m), standbys: ceph-gp-rbd-fqj2nr-node2 mds: 1/1 daemons up osd: 12 osds: 12 up (since 2m), 12 in (since 19h) data: volumes: 1/1 healthy [root@ceph-gp-rbd-fqj2nr-node3 cephuser]# systemctl status tcmu-runner ● tcmu-runner.service - LIO Userspace-passthrough daemon Loaded: loaded (/usr/lib/systemd/system/tcmu-runner.service; disabled; vendor preset: disabled) Active: inactive (dead) Docs: man:tcmu-runner(8) Sep 01 22:05:28 ceph-gp-rbd-fqj2nr-node3 conmon[116638]: teardown: Sending SIGTERM to PID 81 Sep 01 22:05:28 ceph-gp-rbd-fqj2nr-node3 conmon[116638]: teardown: Waiting PID 81 to terminate . Sep 01 22:05:28 ceph-gp-rbd-fqj2nr-node3 conmon[116638]: Sep 01 22:05:28 ceph-gp-rbd-fqj2nr-node3 conmon[116638]: (process:81): GLib-GObject-CRITICAL **: 22:05:28.996: g_object_unref: assertion 'G_IS_OBJECT (object)' failed Sep 01 22:05:29 ceph-gp-rbd-fqj2nr-node3 conmon[116638]: Sep 01 22:05:29 ceph-gp-rbd-fqj2nr-node3 conmon[116638]: teardown: Process 81 is terminated Sep 01 22:05:29 ceph-gp-rbd-fqj2nr-node3 sh[125169]: 5a95091781868327a26d71f3ab91b0152d438179b4fa9b98fd55e9181e3df0dc Sep 01 22:05:29 ceph-gp-rbd-fqj2nr-node3 systemd[1]: tcmu-runner.service: Main process exited, code=exited, status=143/n/a Sep 01 22:05:29 ceph-gp-rbd-fqj2nr-node3 systemd[1]: tcmu-runner.service: Failed with result 'exit-code'. Sep 01 22:05:29 ceph-gp-rbd-fqj2nr-node3 systemd[1]: Stopped TCMU Runner. [root@ceph-gp-rbd-fqj2nr-node3 cephuser]# Restart API services and check the status: ------------------------------------------ [root@ceph-gp-rbd-fqj2nr-node3 cephuser]# systemctl enable rbd-target-api Created symlink /etc/systemd/system/multi-user.target.wants/rbd-target-api.service → /usr/lib/systemd/system/rbd-target-api.service. [root@ceph-gp-rbd-fqj2nr-node3 cephuser]# systemctl start rbd-target-api [root@ceph-gp-rbd-fqj2nr-node3 cephuser]# systemctl status tcmu-runner ● tcmu-runner.service - LIO Userspace-passthrough daemon Loaded: loaded (/usr/lib/systemd/system/tcmu-runner.service; disabled; vendor preset: disabled) Active: active (running) since Wed 2021-09-01 22:12:24 EDT; 4s ago Docs: man:tcmu-runner(8) Main PID: 125704 (tcmu-runner) Tasks: 24 (limit: 23505) Memory: 37.8M CGroup: /system.slice/tcmu-runner.service └─125704 /usr/bin/tcmu-runner cluster: id: 7c80e5b9-c8ae-4fbb-b9e5-36b4288182d0 health: HEALTH_WARN mons are allowing insecure global_id reclaim insufficient standby MDS daemons available 1 pools have too many placement groups services: mon: 3 daemons, quorum ceph-gp-rbd-fqj2nr-node2,ceph-gp-rbd-fqj2nr-node3,ceph-gp-rbd-fqj2nr-node1-installer (age 12m) mgr: ceph-gp-rbd-fqj2nr-node1-installer(active, since 11m), standbys: ceph-gp-rbd-fqj2nr-node2 mds: 1/1 daemons up osd: 12 osds: 12 up (since 9m), 12 in (since 19h) tcmu-runner: 2 daemons active (2 hosts) Check on client side for creating files: ---------------------------------------- [root@ceph-gp-rbd-fqj2nr-node7 iscsi_ditr]# dd if=/dev/zero of=file3.txt count=1024 bs=1048576 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 208.984 s, 5.1 MB/s [root@ceph-gp-rbd-fqj2nr-node7 iscsi_ditr]# cd [root@ceph-gp-rbd-fqj2nr-node7 ~]# [root@ceph-gp-rbd-fqj2nr-node7 ~]# cd /tmp/iscsi_ditr/ [root@ceph-gp-rbd-fqj2nr-node7 iscsi_ditr]# dd if=/dev/zero of=file4.txt count=1024 bs=1048576 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.1933 s, 256 MB/s
Hi Xiubo, You need to enter in to root after login with cephuser (Do "sudo su"). Else use "sudo" before your command.
Hi Guillaume, Can you look in to this issue?
Hi Guillaume, Setup is still there. [ceph: root@ceph-gp-rbd-fqj2nr-node1-installer ~]# ceph -s cluster: id: 7c80e5b9-c8ae-4fbb-b9e5-36b4288182d0 health: HEALTH_WARN mons are allowing insecure global_id reclaim insufficient standby MDS daemons available 1 pools have too many placement groups services: mon: 3 daemons, quorum ceph-gp-rbd-fqj2nr-node2,ceph-gp-rbd-fqj2nr-node3,ceph-gp-rbd-fqj2nr-node1-installer (age 5d) mgr: ceph-gp-rbd-fqj2nr-node1-installer(active, since 5d), standbys: ceph-gp-rbd-fqj2nr-node2 mds: 1/1 daemons up osd: 12 osds: 12 up (since 5d), 12 in (since 6d) data: volumes: 1/1 healthy pools: 4 pools, 193 pgs objects: 1.09k objects, 4.1 GiB usage: 13 GiB used, 167 GiB / 180 GiB avail pgs: 193 active+clean io: client: 853 B/s rd, 0 op/s rd, 0 op/s wr [ceph: root@ceph-gp-rbd-fqj2nr-node1-installer ~]# ceph orch ls NAME RUNNING REFRESHED AGE PLACEMENT crash 0/6 - 5d label:ceph iscsi.rbd 0/2 - 5d count:2;label:iscsigws mds.cephfs 0/3 - 5d count:3;label:mdss mgr 0/2 - 5d count:2;label:mgrs mon 0/3 - 5d count:3;label:mons [ceph: root@ceph-gp-rbd-fqj2nr-node1-installer ~]# ceph orch host ls HOST ADDR LABELS STATUS ceph-gp-rbd-fqj2nr-node1-installer 10.0.210.115 mgrs mons ceph ceph-gp-rbd-fqj2nr-node2 10.0.209.21 mgrs mons osds ceph ceph-gp-rbd-fqj2nr-node3 10.0.210.56 iscsigws mons osds ceph ceph-gp-rbd-fqj2nr-node4 10.0.208.78 mdss ceph ceph-gp-rbd-fqj2nr-node5 10.0.209.206 iscsigws mdss osds ceph ceph-gp-rbd-fqj2nr-node6 10.0.210.179 grafana-server mdss monitoring ceph [ceph: root@ceph-gp-rbd-fqj2nr-node1-installer ~]# [root@ceph-gp-rbd-fqj2nr-node3 cephuser]# systemctl status tcmu-runner ● tcmu-runner.service - LIO Userspace-passthrough daemon Loaded: loaded (/usr/lib/systemd/system/tcmu-runner.service; disabled; vendor preset: disabled) Active: active (running) since Wed 2021-09-01 22:26:22 EDT; 5 days ago Docs: man:tcmu-runner(8) Main PID: 7350 (tcmu-runner) Tasks: 5 (limit: 23465) Memory: 45.6M CGroup: /system.slice/tcmu-runner.service └─7350 /usr/bin/tcmu-runner Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
Hi Guillaume, I don't have setup right now but i will share new setup soon by reproducing the issue. Thanks, Gopi
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 5.0 Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:5020