Description of problem: Unable to create max luns per target. Container is crashed after some luns creation. Version-Release number of selected component (if applicable): ceph version 16.2.0-72.el8cp (1e802193e0b4084ffcdb2338dd09f08bbea54a1a) pacific (stable) How reproducible: 100% Steps to Reproduce: 1.Create gateways using below file. [ceph: root@magna104 ~]# cat iscsi.yaml service_type: iscsi service_id: iscsi placement: hosts: - magna108 - magna113 spec: pool: iscsi_pool trusted_ip_list: "ipv4,ipv6" api_user: admin api_password: admin [ceph: root@magna104 ~]# 2.Start iscsi gateways using "Gwcli" 3.Create target and gateways /iscsi-targets> ls o- iscsi-targets ................................................................................. [DiscoveryAuth: None, Targets: 1] o- iqn.2003-01.com.redhat.iscsi-gw:ceph-igw ............................................................ [Auth: None, Gateways: 2] o- disks ............................................................................................................ [Disks: 0] o- gateways .............................................................................................. [Up: 2/2, Portals: 2] | o- magna108 .............................................................................................. [10.8.128.108 (UP)] | o- magna113 .............................................................................................. [10.8.128.113 (UP)] o- host-groups .................................................................................................... [Groups : 0] o- hosts ......................................................................................... [Auth: ACL_ENABLED, Hosts: 0] /iscsi-targets> 4.Create client iqn. 5.Create images and add disks to client Actual results: /iscsi-target...at:rh7-client> disk add iscsi_pool/image127 ok /iscsi-target...at:rh7-client> disk add iscsi_pool/image128 Exception in thread Thread-11: Traceback (most recent call last): File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/usr/lib64/python3.6/threading.py", line 1182, in run self.function(*self.args, **self.kwargs) File "/usr/lib/python3.6/site-packages/gwcli/gateway.py", line 646, in check_gateways check_thread.start() File "/usr/lib64/python3.6/threading.py", line 846, in start _start_new_thread(self._bootstrap, ()) RuntimeError: can't start new thread [root@magna108 ubuntu]# podman exec -it ff1f0ffc5f35 sh Error: no container with name or ID ff1f0ffc5f35 found: no such container [root@magna108 ubuntu]# Expected results: Max luns should be created per target.
I tried same scenario on 4.2 bare metal setup and it worked fine. /iscsi-target...at:rh7-client> ls o- iqn.1994-05.com.redhat:rh7-client ................................................. [LOGGED-IN, Auth: CHAP, Disks: 256(1074076M)] o- lun 0 ................................................. [rbd/rbd_disk(1.0T), Owner: dell-r730-068.dsal.lab.eng.rdu2.redhat.com] o- lun 1 ................................................... [rbd/disk_1(100M), Owner: dell-r730-065.dsal.lab.eng.rdu2.redhat.com] o- lun 2 ................................................... [rbd/disk_2(100M), Owner: dell-r730-068.dsal.lab.eng.rdu2.redhat.com] o- lun 3 ................................................... [rbd/disk_3(100M), Owner: dell-r730-065.dsal.lab.eng.rdu2.redhat.com] o- lun 4 ................................................... [rbd/disk_4(100M), Owner: dell-r730-068.dsal.lab.eng.rdu2.redhat.com] o- lun 5 ................................................... [rbd/disk_5(100M), Owner: dell-r730-065.dsal.lab.eng.rdu2.redhat.com] . . . o- lun 250 ............................................... [rbd/disk_250(100M), Owner: dell-r730-068.dsal.lab.eng.rdu2.redhat.com] o- lun 251 ............................................... [rbd/disk_251(100M), Owner: dell-r730-065.dsal.lab.eng.rdu2.redhat.com] o- lun 252 ............................................... [rbd/disk_252(100M), Owner: dell-r730-068.dsal.lab.eng.rdu2.redhat.com] o- lun 253 ............................................... [rbd/disk_253(100M), Owner: dell-r730-065.dsal.lab.eng.rdu2.redhat.com] o- lun 254 ............................................... [rbd/disk_254(100M), Owner: dell-r730-068.dsal.lab.eng.rdu2.redhat.com] o- lun 255 ............................................... [rbd/disk_255(100M), Owner: dell-r730-065.dsal.lab.eng.rdu2.redhat.com] /iscsi-target...at:rh7-client> disk add rbd/disk_256 size=100m Failed : Disk limit of 256 reached. disk auto-define failed(8), try using the /disks create command /iscsi-target...at:rh7-client> /iscsi-target...at:rh7-client> goto gateways /iscsi-target...6046/gateways> cd /disks /disks> create rbd image=disk_1 size=100m Failed : Disk limit of 256 reached.
(In reply to Gopi from comment #3) > I tried same scenario on 4.2 bare metal setup and it worked fine. So why isn't the bug marked as a regression? > > /iscsi-target...at:rh7-client> ls > o- iqn.1994-05.com.redhat:rh7-client > ................................................. [LOGGED-IN, Auth: CHAP, > Disks: 256(1074076M)] > o- lun 0 ................................................. > [rbd/rbd_disk(1.0T), Owner: dell-r730-068.dsal.lab.eng.rdu2.redhat.com] > o- lun 1 ................................................... > [rbd/disk_1(100M), Owner: dell-r730-065.dsal.lab.eng.rdu2.redhat.com] > o- lun 2 ................................................... > [rbd/disk_2(100M), Owner: dell-r730-068.dsal.lab.eng.rdu2.redhat.com] > o- lun 3 ................................................... > [rbd/disk_3(100M), Owner: dell-r730-065.dsal.lab.eng.rdu2.redhat.com] > o- lun 4 ................................................... > [rbd/disk_4(100M), Owner: dell-r730-068.dsal.lab.eng.rdu2.redhat.com] > o- lun 5 ................................................... > [rbd/disk_5(100M), Owner: dell-r730-065.dsal.lab.eng.rdu2.redhat.com] > . > . > . > o- lun 250 ............................................... > [rbd/disk_250(100M), Owner: dell-r730-068.dsal.lab.eng.rdu2.redhat.com] > o- lun 251 ............................................... > [rbd/disk_251(100M), Owner: dell-r730-065.dsal.lab.eng.rdu2.redhat.com] > o- lun 252 ............................................... > [rbd/disk_252(100M), Owner: dell-r730-068.dsal.lab.eng.rdu2.redhat.com] > o- lun 253 ............................................... > [rbd/disk_253(100M), Owner: dell-r730-065.dsal.lab.eng.rdu2.redhat.com] > o- lun 254 ............................................... > [rbd/disk_254(100M), Owner: dell-r730-068.dsal.lab.eng.rdu2.redhat.com] > o- lun 255 ............................................... > [rbd/disk_255(100M), Owner: dell-r730-065.dsal.lab.eng.rdu2.redhat.com] > /iscsi-target...at:rh7-client> disk add rbd/disk_256 size=100m > Failed : Disk limit of 256 reached. > disk auto-define failed(8), try using the /disks create command > /iscsi-target...at:rh7-client> > > /iscsi-target...at:rh7-client> goto gateways > /iscsi-target...6046/gateways> cd /disks > /disks> create rbd image=disk_1 size=100m > Failed : Disk limit of 256 reached.
Hi Yaniv, Thanks for pointing. Looks like I missed adding keyword. Will make a note.
Ilya, could you verify that https://github.com/ceph/ceph/pull/42214 works for you?
This work is progressing but should not serve as a bottleneck to 5.0 due to the fact that these limits were not usually reached in practice. Moving to 5.1
Issue still exist with the fix provided. Unable to create max luns per target i.e 256 with the latest build. Containers are being crashed after 190th luns creation. Previously, we were allowed until 127th lun creation with the fix we could see only 190th lun gets created. Health status before crash when 190lun gets added successfully - [root@ceph-pnataraj-hkpq9z-node1-installer cephuser]# [root@ceph-pnataraj-hkpq9z-node1-installer cephuser]# [root@ceph-pnataraj-hkpq9z-node1-installer cephuser]# ceph status cluster: id: 7e821d06-c172-11ec-b5bf-fa163e42a69f health: HEALTH_WARN 380 stray daemon(s) not managed by cephadm services: mon: 3 daemons, quorum ceph-pnataraj-hkpq9z-node1-installer,ceph-pnataraj-hkpq9z-node2,ceph-pnataraj-hkpq9z-node3 (age 3h) mgr: ceph-pnataraj-hkpq9z-node1-installer.emcclr(active, since 3h), standbys: ceph-pnataraj-hkpq9z-node2.uvsywu osd: 10 osds: 10 up (since 3h), 10 in (since 3h) tcmu-runner: 380 portals active (2 hosts) data: pools: 4 pools, 97 pgs objects: 791 objects, 345 KiB usage: 446 MiB used, 200 GiB / 200 GiB avail pgs: 97 active+clean io: client: 1.7 KiB/s rd, 1 op/s rd, 0 op/s wr Health status when 191th lun was added - [root@ceph-pnataraj-hkpq9z-node1-installer cephuser]# ceph status cluster: id: 7e821d06-c172-11ec-b5bf-fa163e42a69f health: HEALTH_WARN 380 stray daemon(s) not managed by cephadm services: mon: 3 daemons, quorum ceph-pnataraj-hkpq9z-node1-installer,ceph-pnataraj-hkpq9z-node2,ceph-pnataraj-hkpq9z-node3 (age 3h) mgr: ceph-pnataraj-hkpq9z-node1-installer.emcclr(active, since 3h), standbys: ceph-pnataraj-hkpq9z-node2.uvsywu osd: 10 osds: 10 up (since 3h), 10 in (since 3h) tcmu-runner: 190 portals active (1 hosts) data: pools: 4 pools, 97 pgs objects: 791 objects, 345 KiB usage: 446 MiB used, 200 GiB / 200 GiB avail pgs: 97 active+clean io: client: 2.5 KiB/s rd, 2 op/s rd, 0 op/s wr NOTE: Health status is in OK state but tcmu-runner container is crashed. [root@ceph-pnataraj-hkpq9z-node4 cephuser]# podman ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES a32804441d49 registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:3ed44ebc6fd5d0d39f52b0c62c72d1147643282abfaa53095d103cec54055f87 -n osd.2 -f --set... 22 hours ago Up 22 hours ago ceph-7e821d06-c172-11ec-b5bf-fa163e42a69f-osd-2 0c498bce1657 registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:3ed44ebc6fd5d0d39f52b0c62c72d1147643282abfaa53095d103cec54055f87 -n osd.4 -f --set... 22 hours ago Up 22 hours ago ceph-7e821d06-c172-11ec-b5bf-fa163e42a69f-osd-4 d8710998f84c registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:3ed44ebc6fd5d0d39f52b0c62c72d1147643282abfaa53095d103cec54055f87 -n osd.7 -f --set... 22 hours ago Up 22 hours ago ceph-7e821d06-c172-11ec-b5bf-fa163e42a69f-osd-7 53f8080a97a1 registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:3ed44ebc6fd5d0d39f52b0c62c72d1147643282abfaa53095d103cec54055f87 22 hours ago Up 22 hours ago ceph-7e821d06-c172-11ec-b5bf-fa163e42a69f-iscsi-iscsipool-ceph-pnataraj-hkpq9z-node4-rbibzm [root@ceph-pnataraj-hkpq9z-node4 cephuser]# [root@ceph-pnataraj-hkpq9z-node1-installer cephuser]# ceph version ceph version 16.2.7-107.el8cp (3106079e34bb001fa0999e9b975bd5e8a413f424) pacific (stable) [root@ceph-pnataraj-hkpq9z-node1-installer cephuser]# Snippet of disk add /iscsi-target...at:rh7-client> disk add iscsipool/test201 ok /iscsi-target...at:rh7-client> disk add iscsipool/test202 ok /iscsi-target...at:rh7-client> disk add iscsipool/test203 ok /iscsi-target...at:rh7-client> disk add iscsipool/test204 ok /iscsi-target...at:rh7-client> /iscsi-target...at:rh7-client> /iscsi-target...at:rh7-client> disk add iscsipool/test205 ok /iscsi-target...at:rh7-client> disk add iscsipool/test206 ok /iscsi-target...at:rh7-client> disk add iscsipool/test207 [root@ceph-pnataraj-hkpq9z-node4 cephuser]# podman exec -it 045a9e5fd62f /bin/bash Error: no container with name or ID "045a9e5fd62f" found: no such container [root@ceph-pnataraj-hkpq9z-node4 cephuser]# ls [root@ceph-pnataraj-hkpq9z-node4 cephuser]# Attached Tcmu-runner log and snippet of output http://pastebin.test.redhat.com/1046690 Cluster status: HOST ADDR LABELS STATUS ceph-pnataraj-hkpq9z-node1-installer 10.0.209.110 _admin mon installer mgr ceph-pnataraj-hkpq9z-node2 10.0.208.171 mgr mon ceph-pnataraj-hkpq9z-node3 10.0.210.54 osd mon ceph-pnataraj-hkpq9z-node4 10.0.208.133 osd mds ceph-pnataraj-hkpq9z-node5 10.0.208.218 osd mds
Yes, node 10.0.208.218 is down. I will repro the issue on fresh cluster and share the system details.
Was able to repro the issue with fix again - 1) Created images using count option - 6 extra images were created under target 2) deleted extra images and made 256 images per target 3) started adding luns to the client, at 201th lun creation found the below error /iscsi-target...at:rh7-client> disk add iscsipool/test219 Exception in thread Thread-92: Traceback (most recent call last): File "/usr/lib64/python3.6/threading.py", line 919, in _bootstrap_inner self.run() File "/usr/lib64/python3.6/threading.py", line 1185, in run self.function(*self.args, **self.kwargs) File "/usr/lib/python3.6/site-packages/gwcli/gateway.py", line 646, in check_gateways check_thread.start() File "/usr/lib64/python3.6/threading.py", line 849, in start _start_new_thread(self._bootstrap, ()) RuntimeError: can't start new thread Ouput - IT hangs at the 201th lun addition and later we see the exception above and after again checking podman ps -a - container was not found Snippet of ceph status, podman ps, is copied below http://pastebin.test.redhat.com/1047363---------> image creation snippet http://pastebin.test.redhat.com/1047393 ------>output after issue hit is copied here cluster details - ssh cephuser@<ip> password is cephuser ceph-pnataraj-luc1vd-node1-installer 10.0.210.237 _admin installer mgr mon ---------->bootstrap/admin node ceph-pnataraj-luc1vd-node2 10.0.209.164 mon mgr ceph-pnataraj-luc1vd-node3 10.0.210.124 osd mon ceph-pnataraj-luc1vd-node4 10.0.209.187 osd mds ------------>iscsi gateway node 1 ceph-pnataraj-luc1vd-node5 10.0.209.227 osd mds -------------------iscsi gateway node2
Hi Adam, I was able to get the unit.run from new cluster. Please find the below snippet [root@ceph-tcmu-iscsi-fotdn1-node4 cephuser]# cd /var/lib/ceph/82e7533c-c54d-11ec-b555-fa163e2c8286/iscsi.iscsipool.ceph-tcmu-iscsi-fotdn1-node4.aicryu/ [root@ceph-tcmu-iscsi-fotdn1-node4 iscsi.iscsipool.ceph-tcmu-iscsi-fotdn1-node4.aicryu]# ls config configfs iscsi-gateway.cfg keyring unit.configured unit.created unit.image unit.meta unit.poststop unit.run unit.stop [root@ceph-tcmu-iscsi-fotdn1-node4 iscsi.iscsipool.ceph-tcmu-iscsi-fotdn1-node4.aicryu]# cat unit.run set -e if ! grep -qs /var/lib/ceph/82e7533c-c54d-11ec-b555-fa163e2c8286/iscsi.iscsipool.ceph-tcmu-iscsi-fotdn1-node4.aicryu/configfs /proc/mounts; then mount -t configfs none /var/lib/ceph/82e7533c-c54d-11ec-b555-fa163e2c8286/iscsi.iscsipool.ceph-tcmu-iscsi-fotdn1-node4.aicryu/configfs; fi # iscsi tcmu-runner container ! /bin/podman rm -f ceph-82e7533c-c54d-11ec-b555-fa163e2c8286-iscsi.iscsipool.ceph-tcmu-iscsi-fotdn1-node4.aicryu-tcmu 2> /dev/null ! /bin/podman rm -f ceph-82e7533c-c54d-11ec-b555-fa163e2c8286-iscsi-iscsipool-ceph-tcmu-iscsi-fotdn1-node4-aicryu-tcmu 2> /dev/null ! /bin/podman rm -f --storage ceph-82e7533c-c54d-11ec-b555-fa163e2c8286-iscsi-iscsipool-ceph-tcmu-iscsi-fotdn1-node4-aicryu-tcmu 2> /dev/null ! /bin/podman rm -f --storage ceph-82e7533c-c54d-11ec-b555-fa163e2c8286-iscsi.iscsipool.ceph-tcmu-iscsi-fotdn1-node4.aicryu-tcmu 2> /dev/null /bin/podman run --rm --ipc=host --stop-signal=SIGTERM --authfile=/etc/ceph/podman-auth.json --net=host --entrypoint /usr/bin/tcmu-runner --privileged --group-add=disk --init --name ceph-82e7533c-c54d-11ec-b555-fa163e2c8286-iscsi-iscsipool-ceph-tcmu-iscsi-fotdn1-node4-aicryu-tcmu --pids-limit=-1 -e CONTAINER_IMAGE=registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:3ed44ebc6fd5d0d39f52b0c62c72d1147643282abfaa53095d103cec54055f87 -e NODE_NAME=ceph-tcmu-iscsi-fotdn1-node4 -e CEPH_USE_RANDOM_NONCE=1 -v /var/lib/ceph/82e7533c-c54d-11ec-b555-fa163e2c8286/iscsi.iscsipool.ceph-tcmu-iscsi-fotdn1-node4.aicryu/config:/etc/ceph/ceph.conf:z -v /var/lib/ceph/82e7533c-c54d-11ec-b555-fa163e2c8286/iscsi.iscsipool.ceph-tcmu-iscsi-fotdn1-node4.aicryu/keyring:/etc/ceph/keyring:z -v /var/lib/ceph/82e7533c-c54d-11ec-b555-fa163e2c8286/iscsi.iscsipool.ceph-tcmu-iscsi-fotdn1-node4.aicryu/iscsi-gateway.cfg:/etc/ceph/iscsi-gateway.cfg:z -v /var/lib/ceph/82e7533c-c54d-11ec-b555-fa163e2c8286/iscsi.iscsipool.ceph-tcmu-iscsi-fotdn1-node4.aicryu/configfs:/sys/kernel/config -v /var/log/ceph/82e7533c-c54d-11ec-b555-fa163e2c8286:/var/log:z -v /dev:/dev --mount type=bind,source=/lib/modules,destination=/lib/modules,ro=true registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:3ed44ebc6fd5d0d39f52b0c62c72d1147643282abfaa53095d103cec54055f87 & # iscsi.iscsipool.ceph-tcmu-iscsi-fotdn1-node4.aicryu ! /bin/podman rm -f ceph-82e7533c-c54d-11ec-b555-fa163e2c8286-iscsi.iscsipool.ceph-tcmu-iscsi-fotdn1-node4.aicryu 2> /dev/null ! /bin/podman rm -f ceph-82e7533c-c54d-11ec-b555-fa163e2c8286-iscsi-iscsipool-ceph-tcmu-iscsi-fotdn1-node4-aicryu 2> /dev/null ! /bin/podman rm -f --storage ceph-82e7533c-c54d-11ec-b555-fa163e2c8286-iscsi-iscsipool-ceph-tcmu-iscsi-fotdn1-node4-aicryu 2> /dev/null ! /bin/podman rm -f --storage ceph-82e7533c-c54d-11ec-b555-fa163e2c8286-iscsi.iscsipool.ceph-tcmu-iscsi-fotdn1-node4.aicryu 2> /dev/null /bin/podman run --rm --ipc=host --stop-signal=SIGTERM --authfile=/etc/ceph/podman-auth.json --net=host --entrypoint /usr/bin/rbd-target-api --privileged --group-add=disk --init --name ceph-82e7533c-c54d-11ec-b555-fa163e2c8286-iscsi-iscsipool-ceph-tcmu-iscsi-fotdn1-node4-aicryu -d --log-driver journald --conmon-pidfile /run/ceph-82e7533c-c54d-11ec-b555-fa163e2c8286.ceph-tcmu-iscsi-fotdn1-node4.aicryu.service-pid --cidfile /run/ceph-82e7533c-c54d-11ec-b555-fa163e2c8286.ceph-tcmu-iscsi-fotdn1-node4.aicryu.service-cid --pids-limit=-1 --cgroups=split -e CONTAINER_IMAGE=registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:3ed44ebc6fd5d0d39f52b0c62c72d1147643282abfaa53095d103cec54055f87 -e NODE_NAME=ceph-tcmu-iscsi-fotdn1-node4 -e CEPH_USE_RANDOM_NONCE=1 -v /var/lib/ceph/82e7533c-c54d-11ec-b555-fa163e2c8286/iscsi.iscsipool.ceph-tcmu-iscsi-fotdn1-node4.aicryu/config:/etc/ceph/ceph.conf:z -v /var/lib/ceph/82e7533c-c54d-11ec-b555-fa163e2c8286/iscsi.iscsipool.ceph-tcmu-iscsi-fotdn1-node4.aicryu/keyring:/etc/ceph/keyring:z -v /var/lib/ceph/82e7533c-c54d-11ec-b555-fa163e2c8286/iscsi.iscsipool.ceph-tcmu-iscsi-fotdn1-node4.aicryu/iscsi-gateway.cfg:/etc/ceph/iscsi-gateway.cfg:z -v /var/lib/ceph/82e7533c-c54d-11ec-b555-fa163e2c8286/iscsi.iscsipool.ceph-tcmu-iscsi-fotdn1-node4.aicryu/configfs:/sys/kernel/config -v /var/log/ceph/82e7533c-c54d-11ec-b555-fa163e2c8286:/var/log:z -v /dev:/dev --mount type=bind,source=/lib/modules,destination=/lib/modules,ro=true registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:3ed44ebc6fd5d0d39f52b0c62c72d1147643282abfaa53095d103cec54055f87 [root@ceph-tcmu-iscsi-fotdn1-node4 iscsi.iscsipool.ceph-tcmu-iscsi-fotdn1-node4.aicryu]#
output from the cluster where issue was seen- [root@ceph-pnataraj-luc1vd-node4 cephuser]# cd /var/lib/ceph/a1dc0a22-c4c0-11ec-87ba-fa163e719dd1/iscsi.iscsipool.ceph-pnataraj-luc1vd-node4.levnga/ [root@ceph-pnataraj-luc1vd-node4 iscsi.iscsipool.ceph-pnataraj-luc1vd-node4.levnga]# [root@ceph-pnataraj-luc1vd-node4 iscsi.iscsipool.ceph-pnataraj-luc1vd-node4.levnga]# [root@ceph-pnataraj-luc1vd-node4 iscsi.iscsipool.ceph-pnataraj-luc1vd-node4.levnga]# [root@ceph-pnataraj-luc1vd-node4 iscsi.iscsipool.ceph-pnataraj-luc1vd-node4.levnga]# [root@ceph-pnataraj-luc1vd-node4 iscsi.iscsipool.ceph-pnataraj-luc1vd-node4.levnga]# [root@ceph-pnataraj-luc1vd-node4 iscsi.iscsipool.ceph-pnataraj-luc1vd-node4.levnga]# cat unit.run set -e if ! grep -qs /var/lib/ceph/a1dc0a22-c4c0-11ec-87ba-fa163e719dd1/iscsi.iscsipool.ceph-pnataraj-luc1vd-node4.levnga/configfs /proc/mounts; then mount -t configfs none /var/lib/ceph/a1dc0a22-c4c0-11ec-87ba-fa163e719dd1/iscsi.iscsipool.ceph-pnataraj-luc1vd-node4.levnga/configfs; fi # iscsi tcmu-runner container ! /bin/podman rm -f ceph-a1dc0a22-c4c0-11ec-87ba-fa163e719dd1-iscsi.iscsipool.ceph-pnataraj-luc1vd-node4.levnga-tcmu 2> /dev/null ! /bin/podman rm -f ceph-a1dc0a22-c4c0-11ec-87ba-fa163e719dd1-iscsi-iscsipool-ceph-pnataraj-luc1vd-node4-levnga-tcmu 2> /dev/null ! /bin/podman rm -f --storage ceph-a1dc0a22-c4c0-11ec-87ba-fa163e719dd1-iscsi-iscsipool-ceph-pnataraj-luc1vd-node4-levnga-tcmu 2> /dev/null ! /bin/podman rm -f --storage ceph-a1dc0a22-c4c0-11ec-87ba-fa163e719dd1-iscsi.iscsipool.ceph-pnataraj-luc1vd-node4.levnga-tcmu 2> /dev/null /bin/podman run --rm --ipc=host --stop-signal=SIGTERM --authfile=/etc/ceph/podman-auth.json --net=host --entrypoint /usr/bin/tcmu-runner --privileged --group-add=disk --init --name ceph-a1dc0a22-c4c0-11ec-87ba-fa163e719dd1-iscsi-iscsipool-ceph-pnataraj-luc1vd-node4-levnga-tcmu --pids-limit=-1 -e CONTAINER_IMAGE=registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:3ed44ebc6fd5d0d39f52b0c62c72d1147643282abfaa53095d103cec54055f87 -e NODE_NAME=ceph-pnataraj-luc1vd-node4 -e CEPH_USE_RANDOM_NONCE=1 -v /var/lib/ceph/a1dc0a22-c4c0-11ec-87ba-fa163e719dd1/iscsi.iscsipool.ceph-pnataraj-luc1vd-node4.levnga/config:/etc/ceph/ceph.conf:z -v /var/lib/ceph/a1dc0a22-c4c0-11ec-87ba-fa163e719dd1/iscsi.iscsipool.ceph-pnataraj-luc1vd-node4.levnga/keyring:/etc/ceph/keyring:z -v /var/lib/ceph/a1dc0a22-c4c0-11ec-87ba-fa163e719dd1/iscsi.iscsipool.ceph-pnataraj-luc1vd-node4.levnga/iscsi-gateway.cfg:/etc/ceph/iscsi-gateway.cfg:z -v /var/lib/ceph/a1dc0a22-c4c0-11ec-87ba-fa163e719dd1/iscsi.iscsipool.ceph-pnataraj-luc1vd-node4.levnga/configfs:/sys/kernel/config -v /var/log/ceph/a1dc0a22-c4c0-11ec-87ba-fa163e719dd1:/var/log:z -v /dev:/dev --mount type=bind,source=/lib/modules,destination=/lib/modules,ro=true registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:3ed44ebc6fd5d0d39f52b0c62c72d1147643282abfaa53095d103cec54055f87 & # iscsi.iscsipool.ceph-pnataraj-luc1vd-node4.levnga ! /bin/podman rm -f ceph-a1dc0a22-c4c0-11ec-87ba-fa163e719dd1-iscsi.iscsipool.ceph-pnataraj-luc1vd-node4.levnga 2> /dev/null ! /bin/podman rm -f ceph-a1dc0a22-c4c0-11ec-87ba-fa163e719dd1-iscsi-iscsipool-ceph-pnataraj-luc1vd-node4-levnga 2> /dev/null ! /bin/podman rm -f --storage ceph-a1dc0a22-c4c0-11ec-87ba-fa163e719dd1-iscsi-iscsipool-ceph-pnataraj-luc1vd-node4-levnga 2> /dev/null ! /bin/podman rm -f --storage ceph-a1dc0a22-c4c0-11ec-87ba-fa163e719dd1-iscsi.iscsipool.ceph-pnataraj-luc1vd-node4.levnga 2> /dev/null /bin/podman run --rm --ipc=host --stop-signal=SIGTERM --authfile=/etc/ceph/podman-auth.json --net=host --entrypoint /usr/bin/rbd-target-api --privileged --group-add=disk --init --name ceph-a1dc0a22-c4c0-11ec-87ba-fa163e719dd1-iscsi-iscsipool-ceph-pnataraj-luc1vd-node4-levnga -d --log-driver journald --conmon-pidfile /run/ceph-a1dc0a22-c4c0-11ec-87ba-fa163e719dd1.ceph-pnataraj-luc1vd-node4.levnga.service-pid --cidfile /run/ceph-a1dc0a22-c4c0-11ec-87ba-fa163e719dd1.ceph-pnataraj-luc1vd-node4.levnga.service-cid --pids-limit=-1 --cgroups=split -e CONTAINER_IMAGE=registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:3ed44ebc6fd5d0d39f52b0c62c72d1147643282abfaa53095d103cec54055f87 -e NODE_NAME=ceph-pnataraj-luc1vd-node4 -e CEPH_USE_RANDOM_NONCE=1 -v /var/lib/ceph/a1dc0a22-c4c0-11ec-87ba-fa163e719dd1/iscsi.iscsipool.ceph-pnataraj-luc1vd-node4.levnga/config:/etc/ceph/ceph.conf:z -v /var/lib/ceph/a1dc0a22-c4c0-11ec-87ba-fa163e719dd1/iscsi.iscsipool.ceph-pnataraj-luc1vd-node4.levnga/keyring:/etc/ceph/keyring:z -v /var/lib/ceph/a1dc0a22-c4c0-11ec-87ba-fa163e719dd1/iscsi.iscsipool.ceph-pnataraj-luc1vd-node4.levnga/iscsi-gateway.cfg:/etc/ceph/iscsi-gateway.cfg:z -v /var/lib/ceph/a1dc0a22-c4c0-11ec-87ba-fa163e719dd1/iscsi.iscsipool.ceph-pnataraj-luc1vd-node4.levnga/configfs:/sys/kernel/config -v /var/log/ceph/a1dc0a22-c4c0-11ec-87ba-fa163e719dd1:/var/log:z -v /dev:/dev --mount type=bind,source=/lib/modules,destination=/lib/modules,ro=true registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:3ed44ebc6fd5d0d39f52b0c62c72d1147643282abfaa53095d103cec54055f87 [root@ceph-pnataraj-luc1vd-node4 iscsi.iscsipool.ceph-pnataraj-luc1vd-node4.levnga]#
Hi Teoman, Preethi is working on it and she will update soon. Thanks, Gopi
We do not have the failed setup now. Will reproduce the issue on the fresh cluster and share it once it is done.
Issue is not reproduced with latest 5.1Z1 build and was able to create 255 luns without any issues. output below: http://pastebin.test.redhat.com/1052677 http://pastebin.test.redhat.com/1052502
Difference with failed QA and working setup is the Vms and baremetal. Issue last seen was reproduced on cluster built on VMs and issue not reproduced which was built on baremetal.
Issue is still seen in latest 5.2 build. As per the discussions, Lun test is applicable only when pids.max is set for both tcmu and iscsi containers ceph version 16.2.8-34.el8cp (da41b2a854c731f3062af5ca7b7aca470b0bec29) pacific (stable) # podman exec -it db95bf5300f1 cat /sys/fs/cgroup/pids/pids.max 23419 # podman exec -it 758f919a4905 cat /sys/fs/cgroup/pids/pids.max max
Issue is not observed in the latest 5.2 build. was able to create 256 luns and below containers are set to max for verification. [root@plena001 ubuntu]# podman exec -it b8a0104f4fd9 cat /sys/fs/cgroup/pids/pids.max max [root@plena001 ubuntu]# podman exec -it f39915874caf cat /sys/fs/cgroup/pids/pids.max max http://pastebin.test.redhat.com/1058710 http://pastebin.test.redhat.com/1058759 [ceph: root@magna021 ceph]# ceph status cluster: id: c8ce6d50-c0a1-11ec-a99b-002590fc2a2e health: HEALTH_OK services: mon: 5 daemons, quorum magna021,magna022,magna024,magna025,magna026 (age 7h) mgr: magna022.icxgsh(active, since 7h), standbys: magna021.syfuos osd: 42 osds: 42 up (since 6h), 42 in (since 8w) rbd-mirror: 1 daemon active (1 hosts) tcmu-runner: 512 portals active (2 hosts) data: pools: 11 pools, 801 pgs objects: 891.65k objects, 3.4 TiB usage: 10 TiB used, 28 TiB / 38 TiB avail pgs: 801 active+clean io: client: 3.5 MiB/s rd, 44 KiB/s wr, 4.49k op/s rd, 83 op/s wr [ceph: root@magna021 ceph]# ceph versions { "mon": { "ceph version 16.2.8-46.el8cp (8300c1ab46e5a5b616a783a729b2248c623a8193) pacific (stable)": 5 }, "mgr": { "ceph version 16.2.8-46.el8cp (8300c1ab46e5a5b616a783a729b2248c623a8193) pacific (stable)": 2 }, "osd": { "ceph version 16.2.8-46.el8cp (8300c1ab46e5a5b616a783a729b2248c623a8193) pacific (stable)": 42 }, "mds": {}, "rbd-mirror": { "ceph version 16.2.8-46.el8cp (8300c1ab46e5a5b616a783a729b2248c623a8193) pacific (stable)": 1 }, "tcmu-runner": { "ceph version 16.2.8-46.el8cp (8300c1ab46e5a5b616a783a729b2248c623a8193) pacific (stable)": 512 }, "overall": { "ceph version 16.2.8-46.el8cp (8300c1ab46e5a5b616a783a729b2248c623a8193) pacific (stable)": 562 } } [ceph: root@magna021 ceph]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage Security, Bug Fix, and Enhancement Update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5997