Description of problem: Customer upgraded to latest Ceph5.3z to resolve the initial segfault they had in tcmu-runner and is till seeing issues running Ceph and seeing segfaults The ework-thread segfaults during load tests Apr 27 08:15:01 esg4wng02202 kernel: [138236.446505] ework-thread[1960297]: segfault at b0 ip 00007fe9401b9194 sp 00007fda61ce49b8 error 4 in librados.so.2.0.0[7fe940107000+1a5000] Apr 27 08:15:01 esg4wng02202 kernel: [138236.446785] Code: 41 5c e9 3f 30 f8 ff 0f 1f 80 00 00 00 00 5b 5d 41 5c c3 f3 0f 1e fa 48 89 c3 e9 1f a1 f8 ff 90 66 0f 1f 44 00 00 f3 0f 1e fa <83> bf b0 00 00 00 02 75 13 48 81 c7 a8 07 00 00 e9 c7 2f f8 ff 0f Version-Release number of selected component (if applicable): How reproducible: Happens a few hours after load test is started with Oracle RAC There are no blocklist issues. Steps to Reproduce: 1.Start the Load test 2.Segfaults happen and then sometimes deadlocks 3. Actual results: Fails the test with the ework-thread segfaulting that will then cause deadlocks in the kernel Expected results: Does not segfault Additional info: Once again I have struggled to open the core file I rebuilt Ceph from scratch and installed the debuginfos (whihc now are available) but I had issues Core file is here https://people.redhat.com/loberman/customer/.core/ All the package son my debug host are installed so not sure what I am doing wrong there. [loberman@segstorage3 ~]$ sudo rpm -qa | grep -i tcm libtcmu-1.5.4-7.el8.x86_64 tcmu-runner-debuginfo-1.5.4-7.el8.x86_64 tcmu-runner-debugsource-1.5.4-7.el8.x86_64 libtcmu-debuginfo-1.5.4-7.el8.x86_64 tcmu-runner-1.5.4-7.el8.x86_64 ceph-common-debuginfo-16.2.10-160.el8.x86_64 ceph-radosgw-debuginfo-16.2.10-160.el8.x86_64 librgw2-debuginfo-16.2.10-160.el8.x86_64 kernel-debuginfo-4.18.0-425.13.1.el8_7.x86_64 ceph-immutable-object-cache-debuginfo-16.2.10-160.el8.x86_64 libcephsqlite-debuginfo-16.2.10-160.el8.x86_64 python3-rbd-debuginfo-16.2.10-160.el8.x86_64 tcmu-runner-debuginfo-1.5.4-7.el8.x86_64 ceph-fuse-debuginfo-16.2.10-160.el8.x86_64 ceph-mon-debuginfo-16.2.10-160.el8.x86_64 libcephfs2-debuginfo-16.2.10-160.el8.x86_64 libradosstriper1-debuginfo-16.2.10-160.el8.x86_64 python3-rados-debuginfo-16.2.10-160.el8.x86_64 rbd-mirror-debuginfo-16.2.10-160.el8.x86_64 elfutils-debuginfod-client-0.187-4.el8.x86_64 cephfs-mirror-debuginfo-16.2.10-160.el8.x86_64 ceph-mgr-debuginfo-16.2.10-160.el8.x86_64 ceph-test-debuginfo-16.2.10-160.el8.x86_64 librados-devel-debuginfo-16.2.10-160.el8.x86_64 python3-cephfs-debuginfo-16.2.10-160.el8.x86_64 rbd-fuse-debuginfo-16.2.10-160.el8.x86_64 libtcmu-debuginfo-1.5.4-7.el8.x86_64 kernel-debuginfo-common-x86_64-4.18.0-425.13.1.el8_7.x86_64 ceph-mds-debuginfo-16.2.10-160.el8.x86_64 librados2-debuginfo-16.2.10-160.el8.x86_64 python3-rgw-debuginfo-16.2.10-160.el8.x86_64 elfutils-debuginfod-client-devel-0.187-4.el8.x86_64 ceph-base-debuginfo-16.2.10-160.el8.x86_64 ceph-osd-debuginfo-16.2.10-160.el8.x86_64 librbd1-debuginfo-16.2.10-160.el8.x86_64 rbd-nbd-debuginfo-16.2.10-160.el8.x86_64 Full log --------- Apr 27 08:14:59 esg4wng02202 tcmu-runner: tcmu_rbd_close:1240 rbd/iscsirepldata01-esg4wng02201-esg4wng02202.grp-cds4wng1001-1002-VOTE1_LUN03: appended blocklist entry: {192.168.5.35:0/4245534485} Apr 27 08:14:59 esg4wng02202 tcmu-runner: tcmu_rbd_rm_stale_entries_from_blocklist:382 rbd/iscsirepldata01-esg4wng02201-esg4wng02202.grp-cds4wng1001-1002-VOTE1_LUN03: removing blocklist entry: {192.168.5.35:0/2254410244} Apr 27 08:15:00 esg4wng02202 tcmu-runner: tcmu_rbd_rm_stale_entries_from_blocklist:382 rbd/iscsirepldata01-esg4wng02201-esg4wng02202.grp-cds4wng1001-1002-VOTE1_LUN03: removing blocklist entry: {192.168.5.35:0/4245534485} Apr 27 08:15:00 esg4wng02202 tcmu-runner: tcmu_notify_lock_lost:273 rbd/iscsirepldata01-esg4wng02201-esg4wng02202.grp-cds4wng1001-1002-VOTE1_LUN01: Async lock drop. Old state 5 Apr 27 08:15:00 esg4wng02202 tcmu-runner: alua_implicit_transition:592 rbd/iscsirepldata01-esg4wng02201-esg4wng02202.grp-cds4wng1001-1002-VOTE1_LUN01: Starting write lock acquisition operation. Apr 27 08:15:01 esg4wng02202 tcmu-runner: tcmu_rbd_open:1209 rbd/iscsirepldata01-esg4wng02201-esg4wng02202.grp-cds4wng1001-1002-VOTE1_LUN03: address: {192.168.5.35:0/2863440141} Apr 27 08:15:01 esg4wng02202 tcmu-runner: tcmu_rbd_close:1240 rbd/iscsirepldata01-esg4wng02201-esg4wng02202.grp-cds4wng1001-1002-VOTE1_LUN01: appended blocklist entry: {192.168.5.35:0/1426901708} Apr 27 08:15:01 esg4wng02202 tcmu-runner: tcmu_rbd_open:1209 rbd/iscsirepldata01-esg4wng02201-esg4wng02202.grp-cds4wng1001-1002-VOTE1_LUN02: address: {192.168.5.35:0/3447570735} Apr 27 08:15:01 esg4wng02202 tcmu-runner: tcmu_acquire_dev_lock:488 rbd/iscsirepldata01-esg4wng02201-esg4wng02202.grp-cds4wng1001-1002-VOTE1_LUN03: Read lock acquisition successful Apr 27 08:15:01 esg4wng02202 tcmu-runner: tcmu_acquire_dev_lock:488 rbd/iscsirepldata01-esg4wng02201-esg4wng02202.grp-cds4wng1001-1002-VOTE1_LUN02: Read lock acquisition successful Apr 27 08:15:01 esg4wng02202 kernel: [138236.446505] ework-thread[1960297]: segfault at b0 ip 00007fe9401b9194 sp 00007fda61ce49b8 error 4 in librados.so.2.0.0[7fe940107000+1a5000] Apr 27 08:15:01 esg4wng02202 kernel: [138236.446785] Code: 41 5c e9 3f 30 f8 ff 0f 1f 80 00 00 00 00 5b 5d 41 5c c3 f3 0f 1e fa 48 89 c3 e9 1f a1 f8 ff 90 66 0f 1f 44 00 00 f3 0f 1e fa <83> bf b0 00 00 00 02 75 13 48 81 c7 a8 07 00 00 e9 c7 2f f8 ff 0f Apr 27 08:16:01 esg4wng02202 kernel: [138296.879907] ABORT_TASK: Found referenced iSCSI task_tag: 70 Apr 27 08:16:08 esg4wng02202 kernel: [138303.778468] ABORT_TASK: Found referenced iSCSI task_tag: 42 Apr 27 08:16:14 esg4wng02202 kernel: [138309.872171] ABORT_TASK: Found referenced iSCSI task_tag: 38 Apr 27 08:16:16 esg4wng02202 kernel: [138311.970428] ABORT_TASK: Found referenced iSCSI task_tag: 55 Apr 27 08:16:33 esg4wng02202 kernel: [138328.977940] iSCSI Login timeout on Network Portal 192.168.55.243:3260 Apr 27 08:16:41 esg4wng02202 kernel: [138336.657922] iSCSI Login timeout on Network Portal 192.168.56.243:3260 Apr 27 08:16:41 esg4wng02202 kernel: [138336.658289] tx_data returned -32, expecting 48. Apr 27 08:16:41 esg4wng02202 kernel: [138336.658641] iSCSI Login negotiation failed. Apr 27 08:16:57 esg4wng02202 kernel: [138352.017883] iSCSI Login timeout on Network Portal 192.168.56.243:3260 [root@esg4wng02202 ~]#" These are all still the older version 16.2.0-117.el8cp in the SOSreport cephadm-16.2.0-117.el8cp.noarch Tue Oct 25 10:03:40 2022 ceph-common-16.2.0-117.el8cp.x86_64 Tue Oct 25 10:03:44 2022 libcephfs2-16.2.0-117.el8cp.x86_64 Fri Oct 21 09:58:17 2022 python3-cephfs-16.2.0-117.el8cp.x86_64 Fri Oct 21 09:58:17 2022 python3-ceph-argparse-16.2.0-117.el8cp.x86_64 Fri Oct 21 09:58:17 2022 python3-ceph-common-16.2.0-117.el8cp.x86_64 Fri Oct 21 09:58:17 2022 But after upgrade [loberman@lobefedora sosreport-esg4wng02201-03477154-2023-04-27-ewrzglz]$ cat sos_commands/ceph/ceph_versions { "mon": { "ceph version 16.2.10-160.el8cp (6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 3 }, "mgr": { "ceph version 16.2.10-160.el8cp (6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 2 }, "osd": { "ceph version 16.2.10-160.el8cp (6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 96 }, "mds": {}, "tcmu-runner": { "ceph version 16.2.10-160.el8cp (6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 32 }, "overall": { "ceph version 16.2.10-160.el8cp (6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 133 } } Likely everything is upgraded but this is not a good thing for support folks. The way the upgrade happens is not via rpm package updates it seems.
Raimund Noticed this. was at the top of my screen and I missed it. Makes sense why we cannot open the core file. We will check with the customer Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/bin/tcmu-runner...Reading symbols from /usr/lib/debug/usr/bin/tcmu-runner-1.5.4-7.el8.x86_64.debug...done. done. BFD: warning: /home/loberman/core.tcmu-runner.0.7b56bb3deb88430fa96e13c4fc7ad1ed.1152774.1682583301000000 is truncated: expected core file size >= 4912242688, found: 2147483648
We are still waiting on a valid core from the customer. The first one was truncated. Regards Laurence
I am not sure if I got the backtrace right yet using my test node to debug. But to me looks like the cluster pointer is NULL gdb) up #1 0x00007f7d3bcece7e in rados_service_update_status () from /lib64/librados.so.2 (gdb) up #2 0x00007f7d3cb3546e in tcmu_rbd_service_status_update (dev=0x41a7600, has_lock=false) at /usr/src/debug/tcmu-runner-1.5.4-7.el8.x86_64/rbd.c:137 137 ret = rados_service_update_status(state->cluster, status_buf); (gdb) p state->cluster $1 = (rados_t) 0x0 static int tcmu_rbd_service_status_update(struct tcmu_device *dev, bool has_lock) { struct tcmu_rbd_state *state = tcmur_dev_get_private(dev); struct tcmur_device *rdev = tcmu_dev_get_private(dev); char *status_buf = NULL; int ret; ret = asprintf(&status_buf, "%s%c%s%c%s%c%"PRIu64"%c%s%c%"PRIu64"%c%s%c%"PRIu64"%c", "lock_owner", '\0', has_lock ? "true" : "false", '\0', "lock_lost_cnt", '\0', rdev->lock_lost_cnt, '\0', "conn_lost_cnt", '\0', rdev->conn_lost_cnt, '\0', "cmd_timed_out_cnt", '\0', rdev->cmd_timed_out_cnt, '\0'); if (ret < 0) { tcmu_dev_err(dev, "Could not allocate status buf. Service will not be updated.\n"); return ret; } ret = rados_service_update_status(state->cluster, status_buf); if (ret < 0) { tcmu_dev_err(dev, "Could not update service status. (Err %d)\n", ret); } free(status_buf); return ret; }
tcmur_device: skip reporting events if the device is closed After the rbd device being blocklisted it will lost the connection and at the same time will timedout all the IOs. While the tcmu-runner will report a 'lock_lost_cnt' event to ceph cluster. And if during this the device is reopened it will cause the use-after-free bug. Notes When reopening the device it will hold the 'rdev->rdev_lock' lock and then clear the 'TCMUR_DEV_FLAG_IS_OPEN' flag. And we can check this flag and just skip reporting events if the device is closed. So null means device closed ? need to understand why we were blocklisted though, That should only happen if we have access to the same LUN from two gateways at the same time. @Raimund, lets get the logs also available so we can check the blocklisting
Chatted with Raimund
Hello @
Hello @rsachere and @loberman , FYI, I'd be cautious to use the Block List as anything of value. There was a site down and a Block List entry was thought to be part of the cause and it turned out to be nothing to do about nothing. Raimund, ping HKlein if you want to know more. Best regards, Manny
(In reply to loberman from comment #14) > tcmur_device: skip reporting events if the device is closed > > After the rbd device being blocklisted it will lost the connection and at > the same time will timedout all the IOs. While the tcmu-runner will report a > 'lock_lost_cnt' event to ceph cluster. And if during this the device is > reopened it will cause the use-after-free bug. > > Notes > When reopening the device it will hold the 'rdev->rdev_lock' lock and then > clear the 'TCMUR_DEV_FLAG_IS_OPEN' flag. And we can check this flag and just > skip reporting events if the device is closed. > > So null means device closed ? > No, it's the entity of network connection with Rados will be freed after the device is closed. > need to understand why we were blocklisted though, That should only happen > if we have access to the same LUN from two gateways at the same time. > > @Raimund, lets get the logs also available so we can check the blocklisting Yeah, for tcmu-runner there will always be only one active LUN, so when another gateway is trying to access the ceph Rados that gateway will try to acquire the rbd exclusive lock from Rados and then Rados will blocklist current the lock owner. Please make sure the multiple path config is correct in initiator side, currently the ceph-iscsi couldn't do real AA, cu should make sure the path choice policy is ALUA. Thanks.
@xiubo Li, Request you to provide the complete test steps to verify the BZ.
Hi Xiubo Li, Following are the test steps performed for verifyigng the BZ - No issue was seen during the test. 1, Setup a ceph-iscsi cluster, which should include 2 gateways, and please enable the debug logs in tcmu-runner services in these two gateways 2, In the initiator side just use fio or other test tools write raw IOs to these two '/dev/sd{x}' scsi devices directly at the same time. 3, Run the test in Step2 for minutes or longer time to see whether could you see the crash 4, If you hit any crash again please provide the debug logs. NOTE - 1) IOs against same image of different gateways path i.e /dev/sda and /dev/sdb were performed for few hours - Issue is not seen - 2) Ran FIO against random luns for 5-6 hours and no issue were seen FYI, Debug log was enabled and there was no seg fault/crash or any error noticed in the tcmu logs. ceph version and tcmu snippet [root@ceph-amk-upgrade-53-ga-s8g9v2-node1-installer cephuser]# rpm -qa | grep tcmu* [root@ceph-amk-upgrade-53-ga-s8g9v2-node1-installer cephuser]# ceph version ceph version 16.2.10-171.el8cp (00a157ecd158911ece116ae43095de793ed9f389) pacific (stable) [root@ceph-amk-upgrade-53-ga-s8g9v2-node5 /]# rpm -qa | grep tcmu* tcmu-runner-1.5.4-8.el8cp.x86_64 libtcmu-1.5.4-8.el8cp.x86_64 FIO workload used - fio --name=test-1 --numjobs=1 --rw=write/randwrite --bs=4/8/32k --iodepth=8 --fsync=32 --time_based --group_reporting --ioengine=libaio --filename=/dev/sda
(In reply to Preethi from comment #46) > Hi Xiubo Li, > > Following are the test steps performed for verifyigng the BZ - No issue was > seen during the test. > 1, Setup a ceph-iscsi cluster, which should include 2 gateways, and please > enable the debug logs in tcmu-runner services in these two gateways > 2, In the initiator side just use fio or other test tools write raw IOs to > these two '/dev/sd{x}' scsi devices directly at the same time. > 3, Run the test in Step2 for minutes or longer time to see whether could you > see the crash > 4, If you hit any crash again please provide the debug logs. > > NOTE - 1) IOs against same image of different gateways path i.e /dev/sda and > /dev/sdb were performed for few hours - Issue is not seen > - 2) Ran FIO against random luns for 5-6 hours and no issue were seen > FYI, Debug log was enabled and there was no seg fault/crash or any error > noticed in the tcmu logs. > > ceph version and tcmu snippet > > [root@ceph-amk-upgrade-53-ga-s8g9v2-node1-installer cephuser]# rpm -qa | > grep tcmu* > [root@ceph-amk-upgrade-53-ga-s8g9v2-node1-installer cephuser]# ceph version > ceph version 16.2.10-171.el8cp (00a157ecd158911ece116ae43095de793ed9f389) > pacific (stable) > > [root@ceph-amk-upgrade-53-ga-s8g9v2-node5 /]# rpm -qa | grep tcmu* > tcmu-runner-1.5.4-8.el8cp.x86_64 > libtcmu-1.5.4-8.el8cp.x86_64 > > FIO workload used - > fio --name=test-1 --numjobs=1 --rw=write/randwrite --bs=4/8/32k --iodepth=8 > --fsync=32 --time_based --group_reporting --ioengine=libaio > --filename=/dev/sda Nice work Preeithi. Thanks very much for your tests. - Xiubo
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 5.3 Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:3259