Bug 2092508 - [iscsi] tcmu container crash when cluster is upgraded from 5.1Z1 to 5.2
Summary: [iscsi] tcmu container crash when cluster is upgraded from 5.1Z1 to 5.2
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Cephadm
Version: 5.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 5.2
Assignee: Adam King
QA Contact: Preethi
Anjana Suparna Sriram
URL:
Whiteboard:
Depends On:
Blocks: 2102272
TreeView+ depends on / blocked
 
Reported: 2022-06-01 17:30 UTC by Preethi
Modified: 2022-08-09 17:38 UTC (History)
6 users (show)

Fixed In Version: ceph-16.2.8-45.el8cp
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-09 17:38:27 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-4431 0 None None None 2022-06-01 17:33:45 UTC
Red Hat Product Errata RHSA-2022:5997 0 None None None 2022-08-09 17:38:52 UTC

Description Preethi 2022-06-01 17:30:47 UTC
Description of problem: tcmu container crash when cluster is upgraded from 5.1Z1 to 5.2 


Version-Release number of selected component (if applicable):
[root@magna021 /]# ceph version
ceph version 16.2.8-22.el8cp (857c12016f00d5089a358b3bbc25abf43d84a7a2) pacific (stable)

How reproducible:


Steps to Reproduce:
1. Perform upgrade to 5.2 latest build from 5.1Z1 cluster which is having 255 luns via ceph orch upgrade command
2. post the upgrade successful check the ceph status and podman ps in iscsi gateway nodes

configuration and
cluster status before upgrade -

 
[root@magna021 ubuntu]# ceph status
  cluster:
    id:     c8ce6d50-c0a1-11ec-a99b-002590fc2a2e
    health: HEALTH_OK
 
  services:
    mon:         5 daemons, quorum magna021,magna022,magna024,magna025,magna026 (age 4d)
    mgr:         magna021.syfuos(active, since 6d), standbys: magna022.icxgsh
    osd:         42 osds: 42 up (since 6d), 42 in (since 3w)
    rbd-mirror:  1 daemon active (1 hosts)
    tcmu-runner: 510 portals active (2 hosts)
 
  data:
    pools:   11 pools, 417 pgs
    objects: 579.62k objects, 2.2 TiB
    usage:   6.6 TiB used, 32 TiB / 38 TiB avail
    pgs:     417 active+clean
 
  io:
    client:   612 KiB/s rd, 41 KiB/s wr, 731 op/s rd, 77 op/s wr
 
[root@magna021 ubuntu]#

Actual results: Upgrade was successful but ceph status do not report about tcmu-runner status and when we check podman ps in iscsi nodes we see there is no tcmu container exist. 

[root@plena001 ubuntu]# podman ps -a
CONTAINER ID  IMAGE                                                                                                                         COMMAND               CREATED       STATUS           PORTS       NAMES
9ad27ceb33e9  registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:691c0eb6ddd228e9ddf6999bbb3903c7312d67f909377d614e1fa27a279a2cce  -n client.crash.p...  36 hours ago  Up 36 hours ago              ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-crash-plena001
df9d0c1e7ea8  registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:691c0eb6ddd228e9ddf6999bbb3903c7312d67f909377d614e1fa27a279a2cce  -n osd.1 -f --set...  36 hours ago  Up 36 hours ago              ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-osd-1
f7ee016dc2f9  registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:691c0eb6ddd228e9ddf6999bbb3903c7312d67f909377d614e1fa27a279a2cce  -n osd.0 -f --set...  36 hours ago  Up 36 hours ago              ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-osd-0
21415866b400  registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:691c0eb6ddd228e9ddf6999bbb3903c7312d67f909377d614e1fa27a279a2cce  -n osd.3 -f --set...  36 hours ago  Up 36 hours ago              ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-osd-3
ab3011fdbb05  registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:691c0eb6ddd228e9ddf6999bbb3903c7312d67f909377d614e1fa27a279a2cce  -n osd.4 -f --set...  36 hours ago  Up 36 hours ago              ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-osd-4
ccacf7be065a  registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:691c0eb6ddd228e9ddf6999bbb3903c7312d67f909377d614e1fa27a279a2cce  -n osd.2 -f --set...  36 hours ago  Up 36 hours ago              ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-osd-2
e4ed9adeb721  registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:691c0eb6ddd228e9ddf6999bbb3903c7312d67f909377d614e1fa27a279a2cce  -n osd.6 -f --set...  36 hours ago  Up 36 hours ago              ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-osd-6
71417fc784d3  registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:691c0eb6ddd228e9ddf6999bbb3903c7312d67f909377d614e1fa27a279a2cce  -n osd.5 -f --set...  36 hours ago  Up 36 hours ago              ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-osd-5
5e5536635ebd  registry.redhat.io/openshift4/ose-prometheus-node-exporter:v4.10                                                              --no-collector.ti...  11 hours ago  Up 11 hours ago              ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-node-exporter-plena001
d90c674c4d03  registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:691c0eb6ddd228e9ddf6999bbb3903c7312d67f909377d614e1fa27a279a2cce                        5 hours ago   Up 5 hours ago               ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-iscsi-iscsipool-plena001-adsjor
[root@plena001 ubuntu]# 



Expected results: We should see the tcmu container and active portal should be listed in ceph status

crash logs-

[root@plena001 ubuntu]# journalctl -xeu ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e.plena001.adsjor.service
Jun 01 12:47:42 plena001 ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-iscsi-iscsipool-plena001-adsjor[2851943]: debug (LUN.add_dev_to_lio) Adding image 'iscsipool/test204' to LIO backstore user:rbd
Jun 01 12:47:42 plena001 ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-iscsi-iscsipool-plena001-adsjor[2851943]: debug (LUN.add_dev_to_lio) Successfully added iscsipool/test204 to LIO
Jun 01 12:47:42 plena001 ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-iscsi-iscsipool-plena001-adsjor[2851943]: debug Setup group ano2 for iscsipool.test204 on tpg 2 (state 1, owner False, failover type 1)
Jun 01 12:47:42 plena001 ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-iscsi-iscsipool-plena001-adsjor[2851943]: debug (LUN.add_dev_to_lio) Adding image 'iscsipool/test205' to LIO backstore user:rbd
Jun 01 12:47:42 plena001 ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-iscsi-iscsipool-plena001-adsjor[2851943]: debug (LUN.add_dev_to_lio) Successfully added iscsipool/test205 to LIO
Jun 01 12:47:42 plena001 ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-iscsi-iscsipool-plena001-adsjor[2851943]: debug Setup group ao for iscsipool.test205 on tpg 2 (state 0, owner True, failover type 1)
Jun 01 12:47:43 plena001 ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-iscsi-iscsipool-plena001-adsjor[2851943]: debug (LUN.add_dev_to_lio) Adding image 'iscsipool/test206' to LIO backstore user:rbd
Jun 01 12:47:43 plena001 ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-iscsi-iscsipool-plena001-adsjor[2851943]: debug (LUN.add_dev_to_lio) Successfully added iscsipool/test206 to LIO
Jun 01 12:47:43 plena001 ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-iscsi-iscsipool-plena001-adsjor[2851943]: debug Setup group ano2 for iscsipool.test206 on tpg 2 (state 1, owner False, failover type 1)
Jun 01 12:47:43 plena001 ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-iscsi-iscsipool-plena001-adsjor[2851943]: debug (LUN.add_dev_to_lio) Adding image 'iscsipool/test207' to LIO backstore user:rbd
Jun 01 12:47:43 plena001 ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-iscsi-iscsipool-plena001-adsjor[2851943]: debug (LUN.add_dev_to_lio) Successfully added iscsipool/test207 to LIO
Jun 01 12:47:43 plena001 ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-iscsi-iscsipool-plena001-adsjor[2851943]: debug Setup group ao for iscsipool.test207 on tpg 2 (state 0, owner True, failover type 1)
Jun 01 12:47:43 plena001 ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-iscsi-iscsipool-plena001-adsjor[2851943]: debug (LUN.add_dev_to_lio) Adding image 'iscsipool/test208' to LIO backstore user:rbd
Jun 01 12:47:43 plena001 ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-iscsi-iscsipool-plena001-adsjor[2851943]: debug (LUN.add_dev_to_lio) Successfully added iscsipool/test208 to LIO
Jun 01 12:47:43 plena001 ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-iscsi-iscsipool-plena001-adsjor[2851943]: debug Setup group ano2 for iscsipool.test208 on tpg 2 (state 1, owner False, failover type 1)
Jun 01 12:47:43 plena001 ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-iscsi-iscsipool-plena001-adsjor[2851943]: debug (LUN.add_dev_to_lio) Adding image 'iscsipool/test209' to LIO backstore user:rbd
Jun 01 12:47:43 plena001 ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-iscsi-iscsipool-plena001-adsjor[2851943]: debug (LUN.add_dev_to_lio) Successfully added iscsipool/test209 to LIO
Jun 01 12:47:43 plena001 ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-iscsi-iscsipool-plena001-adsjor[2851943]: debug Setup group ao for iscsipool.test209 on tpg 2 (state 0, owner True, failover type 1)
Jun 01 12:47:43 plena001 ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-iscsi-iscsipool-plena001-adsjor[2851943]: debug (LUN.add_dev_to_lio) Adding image 'iscsipool/test21' to LIO backstore user:rbd
Jun 01 12:47:43 plena001 ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-iscsi-iscsipool-plena001-adsjor[2851943]: debug (LUN.add_dev_to_lio) Successfully added iscsipool/test21 to LIO
Jun 01 12:47:43 plena001 ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-iscsi-iscsipool-plena001-adsjor[2851943]: debug Setup group ao for iscsipool.test21 on tpg 2 (state 0, owner True, failover type 1)
Jun 01 12:47:44 plena001 ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-iscsi-iscsipool-plena001-adsjor[2851943]: debug (LUN.add_dev_to_lio) Adding image 'iscsipool/test210' to LIO backstore user:rbd
Jun 01 12:47:44 plena001 ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-iscsi-iscsipool-plena001-adsjor[2851943]: debug (LUN.add_dev_to_lio) Successfully added iscsipool/test210 to LIO
Jun 01 12:47:44 plena001 ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-iscsi-iscsipool-plena001-adsjor[2851943]: debug Setup group ano2 for iscsipool.test210 on tpg 2 (state 1, owner False, failover type 1)
Jun 01 12:47:44 plena001 ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-iscsi-iscsipool-plena001-adsjor[2851943]: debug (LUN.add_dev_to_lio) Adding image 'iscsipool/test211' to LIO backstore user:rbd
Jun 01 12:47:44 plena001 ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-iscsi-iscsipool-plena001-adsjor[2851943]: debug (LUN.add_dev_to_lio) Successfully added iscsipool/test211 to LIO
Jun 01 12:47:44 plena001 ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-iscsi-iscsipool-plena001-adsjor[2851943]: debug Setup group ao for iscsipool.test211 on tpg 2 (state 0, owner True, failover type 1)
Jun 01 12:47:44 plena001 ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-iscsi-iscsipool-plena001-adsjor[2851943]: debug (LUN.add_dev_to_lio) Adding image 'iscsipool/test2110' to LIO backstore user:rbd
Jun 01 12:47:44 plena001 bash[2851481]: Thread::try_create(): pthread_create failed with error 11/builddir/build/BUILD/ceph-16.2.8/src/common/Thread.cc: In function 'void Thread::create(const char*, size_t)' th>
Jun 01 12:47:44 plena001 bash[2851481]: /builddir/build/BUILD/ceph-16.2.8/src/common/Thread.cc: 165: FAILED ceph_assert(ret == 0)
Jun 01 12:47:44 plena001 bash[2851481]:  ceph version 16.2.8-22.el8cp (857c12016f00d5089a358b3bbc25abf43d84a7a2) pacific (stable)
Jun 01 12:47:44 plena001 bash[2851481]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7f9c4252fd48]
Jun 01 12:47:44 plena001 bash[2851481]:  2: /usr/lib64/ceph/libceph-common.so.2(+0x276f62) [0x7f9c4252ff62]
Jun 01 12:47:44 plena001 bash[2851481]:  3: /usr/lib64/ceph/libceph-common.so.2(+0x363127) [0x7f9c4261c127]
Jun 01 12:47:44 plena001 bash[2851481]:  4: (CommonSafeTimer<std::mutex>::init()+0x1fe) [0x7f9c4262278e]
Jun 01 12:47:44 plena001 bash[2851481]:  5: (MgrClient::init()+0x40) [0x7f9c428f2570]
Jun 01 12:47:44 plena001 bash[2851481]:  6: /lib64/librados.so.2(+0xbbdb1) [0x7f9c4b769db1]
Jun 01 12:47:44 plena001 bash[2851481]:  7: rados_connect()
Jun 01 12:47:44 plena001 bash[2851481]:  8: /usr/lib64/tcmu-runner/handler_rbd.so(+0x3831) [0x7f9c4c530831]
Jun 01 12:47:44 plena001 bash[2851481]:  9: /usr/lib64/tcmu-runner/handler_rbd.so(+0x48e8) [0x7f9c4c5318e8]
Jun 01 12:47:44 plena001 bash[2851481]:  10: /usr/bin/tcmu-runner() [0x414ef1]
Jun 01 12:47:44 plena001 bash[2851481]:  11: /lib64/libtcmu.so.2(+0xb576) [0x7f9c52d52576]
Jun 01 12:47:44 plena001 bash[2851481]:  12: /lib64/libtcmu.so.2(+0xa7b3) [0x7f9c52d517b3]
Jun 01 12:47:44 plena001 bash[2851481]:  13: /lib64/libnl-genl-3.so.200(+0x3fb5) [0x7f9c515d3fb5]
Jun 01 12:47:44 plena001 bash[2851481]:  14: nl_recvmsgs_report()
Jun 01 12:47:44 plena001 bash[2851481]:  15: nl_recvmsgs()
Jun 01 12:47:44 plena001 bash[2851481]:  16: tcmulib_master_fd_ready()
Jun 01 12:47:44 plena001 bash[2851481]:  17: /usr/bin/tcmu-runner() [0x4133de]
Jun 01 12:47:44 plena001 bash[2851481]:  18: g_main_context_dispatch()
Jun 01 12:47:44 plena001 bash[2851481]:  19: /lib64/libglib-2.0.so.0(+0x4dd18) [0x7f9c5247ed18]
Jun 01 12:47:44 plena001 bash[2851481]:  20: g_main_loop_run()
Jun 01 12:47:44 plena001 bash[2851481]:  21: /usr/bin/tcmu-runner() [0x415c5b]
Jun 01 12:47:44 plena001 bash[2851481]:  22: __libc_start_main()
Jun 01 12:47:44 plena001 bash[2851481]:  23: /usr/bin/tcmu-runner() [0x40817e]


Additional info:

magna021 - root/q
plena001 - iscsi gateway node1
plena002 - iscsi gateway node2




ceph status post upgrade:
[root@magna021 /]# ceph status
  cluster:
    id:     c8ce6d50-c0a1-11ec-a99b-002590fc2a2e
    health: HEALTH_WARN
            Redeploying daemon node-exporter.magna021 on host magna021 failed.
 
  services:
    mon:        5 daemons, quorum magna021,magna022,magna024,magna025,magna026 (age 12h)
    mgr:        magna021.syfuos(active, since 35h), standbys: magna022.icxgsh
    osd:        42 osds: 42 up (since 35h), 42 in (since 6w)
    rbd-mirror: 1 daemon active (1 hosts)
 
  data:
    pools:   14 pools, 897 pgs
    objects: 882.61k objects, 3.3 TiB
    usage:   10 TiB used, 28 TiB / 38 TiB avail
    pgs:     897 active+clean
 
  io:
    client:   654 KiB/s rd, 50 KiB/s wr, 779 op/s rd, 87 op/s wr
 
  progress:
    Upgrade to 16.2.8-22.el8cp (21m)
      [======================......] (remaining: 5m)
 
[root@magna021 /]# ceph health detail
HEALTH_WARN Redeploying daemon node-exporter.magna021 on host magna021 failed.
[WRN] UPGRADE_REDEPLOY_DAEMON: Redeploying daemon node-exporter.magna021 on host magna021 failed.
    Upgrade daemon: node-exporter.magna021: cephadm exited with an error code: 1, stderr:Redeploy daemon node-exporter.magna021 ...
Non-zero exit code 1 from systemctl start ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e
systemctl: stderr Job for ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e.service failed because the control process exited with error code.
systemctl: stderr See "systemctl status ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e.service" and "journalctl -xe" for details.
Traceback (most recent call last):
  File "/var/lib/ceph/c8ce6d50-c0a1-11ec-a99b-002590fc2a2e/cephadm.259b6943093ee438dfe0837018f826fc5bc47fd2a4f5b27678a18e8e65a29026", line 9121, in <module>
    main()
  File "/var/lib/ceph/c8ce6d50-c0a1-11ec-a99b-002590fc2a2e/cephadm.259b6943093ee438dfe0837018f826fc5bc47fd2a4f5b27678a18e8e65a29026", line 9109, in main
    r = ctx.func(ctx)
  File "/var/lib/ceph/c8ce6d50-c0a1-11ec-a99b-002590fc2a2e/cephadm.259b6943093ee438dfe0837018f826fc5bc47fd2a4f5b27678a18e8e65a29026", line 1970, in _default_image
    return func(ctx)
  File "/var/lib/ceph/c8ce6d50-c0a1-11ec-a99b-002590fc2a2e/cephadm.259b6943093ee438dfe0837018f826fc5bc47fd2a4f5b27678a18e8e65a29026", line 5033, in command_deploy
    ports=daemon_ports)
  File "/var/lib/ceph/c8ce6d50-c0a1-11ec-a99b-002590fc2a2e/cephadm.259b6943093ee438dfe0837018f826fc5bc47fd2a4f5b27678a18e8e65a29026", line 2925, in deploy_daemon
    c, osd_fsid=osd_fsid, ports=ports)
  File "/var/lib/ceph/c8ce6d50-c0a1-11ec-a99b-002590fc2a2e/cephadm.259b6943093ee438dfe0837018f826fc5bc47fd2a4f5b27678a18e8e65a29026", line 3170, in deploy_daemon_units
    call_throws(ctx, ['systemctl', 'start', unit_name])
  File "/var/lib/ceph/c8ce6d50-c0a1-11ec-a99b-002590fc2a2e/cephadm.259b6943093ee438dfe0837018f826fc5bc47fd2a4f5b27678a18e8e65a29026", line 1637, in call_throws
    raise RuntimeError(f'Failed command: {" ".join(command)}: {s}')
RuntimeError: Failed command: systemctl start ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e: Job for ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e.service failed because the control process exited with error code.
See "systemctl status ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e.service" and "journalctl -xe" for details.

[root@magna021 /]# 

ceph orch ls-
[root@magna021 /]# ceph orch ls
NAME             PORTS        RUNNING  REFRESHED  AGE  PLACEMENT                                     
alertmanager     ?:9093,9094      1/1  2m ago     6w   count:1                                       
crash                           14/14  5m ago     6w   *                                             
grafana          ?:3000           1/1  2m ago     6w   count:1                                       
iscsi.iscsipool                   2/2  2m ago     2w   plena001;plena002                             
mgr                               2/2  2m ago     6w   count:2                                       
mon                               5/5  2m ago     3w   magna021;magna022;magna024;magna025;magna026  
node-exporter    ?:9100         14/14  5m ago     6w   *                                             
osd                                42  2m ago     -    <unmanaged>                                   
prometheus       ?:9095           1/1  2m ago     6w   count:1                                       
rbd-mirror                        1/1  2m ago     3w   magna026                                      
[root@magna021 /]# 


d90c674c4d03  registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:691c0eb6ddd228e9ddf6999bbb3903c7312d67f909377d614e1fa27a279a2cce                        5 hours ago   Up 5 hours ago               ceph-c8ce6d50-c0a1-11ec-a99b-002590fc2a2e-iscsi-iscsipool-plena001-adsjor
[root@plena001 ubuntu]# podman exec -it d90c674c4d03 /bin/bash
[root@plena001 /]# gwcli
Warning: Could not load preferences file /root/.gwcli/prefs.bin.
REST API failure, code : 500
Unable to access the configuration object
AttributeError: 'Settings' object has no attribute 'api_endpoint'
[root@plena001 /]# ls

Comment 1 Preethi 2022-06-03 05:32:23 UTC
Ceph version of 5.1Z1 - ceph version 16.2.7-112.el8cp

Comment 4 Preethi 2022-06-08 02:51:58 UTC
@Adam, Issue still exist with latest 5.2 builds and update in the BZ https://bugzilla.redhat.com/show_bug.cgi?id=1976128. This could be verified once we have fix for the BZ dependent as per the discussions. However, I was able to see the issue when I upgraded to the latest builds. I guess we cannot verify until we have fix i.e pids.max is set for both tcmu and iscsi containers

Comment 5 Preethi 2022-06-08 14:34:25 UTC
As per the discussions, moved the issue back to assigned state, Also,We  cannot perform lun test if the fix of BZ 1976128 available with PIDs to MAX limit for both ISCSI and TCMU containers and the upgrade is again dependent on 255 luns and cannot be verified until we have the fix for BZ 1976128.

Comment 10 Preethi 2022-06-19 17:58:21 UTC
issue is not seen with the latest 5.2 builds where fix is implemented.

Below snippet after upgrade
[ceph: root@magna021 /]# ceph status
  cluster:
    id:     c8ce6d50-c0a1-11ec-a99b-002590fc2a2e
    health: HEALTH_OK
 
  services:
    mon:         5 daemons, quorum magna021,magna022,magna024,magna025,magna026 (age 19m)
    mgr:         magna022.icxgsh(active, since 22m), standbys: magna021.syfuos
    osd:         42 osds: 42 up (since 6m), 42 in (since 8w)
    rbd-mirror:  1 daemon active (1 hosts)
    tcmu-runner: 512 portals active (2 hosts)
 
  data:
    pools:   11 pools, 801 pgs
    objects: 891.65k objects, 3.4 TiB
    usage:   10 TiB used, 28 TiB / 38 TiB avail
    pgs:     801 active+clean
 
  io:
    client:   218 KiB/s rd, 231 KiB/s wr, 59 op/s rd, 56 op/s wr
 
[ceph: root@magna021 /]# ceph version
ceph version 16.2.8-47.el8cp (48087358763c55c41f590e2beabc1fd341b89226) pacific (stable)
[ceph: root@magna021 /]# ceph versions
{
    "mon": {
        "ceph version 16.2.8-47.el8cp (48087358763c55c41f590e2beabc1fd341b89226) pacific (stable)": 5
    },
    "mgr": {
        "ceph version 16.2.8-47.el8cp (48087358763c55c41f590e2beabc1fd341b89226) pacific (stable)": 2
    },
    "osd": {
        "ceph version 16.2.8-47.el8cp (48087358763c55c41f590e2beabc1fd341b89226) pacific (stable)": 42
    },
    "mds": {},
    "rbd-mirror": {
        "ceph version 16.2.8-47.el8cp (48087358763c55c41f590e2beabc1fd341b89226) pacific (stable)": 1
    },
    "tcmu-runner": {
        "ceph version 16.2.8-47.el8cp (48087358763c55c41f590e2beabc1fd341b89226) pacific (stable)": 512
    },
    "overall": {
        "ceph version 16.2.8-47.el8cp (48087358763c55c41f590e2beabc1fd341b89226) pacific (stable)": 562
    }
}
[ceph: root@magna021 /]#

Comment 15 errata-xmlrpc 2022-08-09 17:38:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage Security, Bug Fix, and Enhancement Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5997


Note You need to log in before you can comment on or make changes to this bug.