Bug 2190288 - CEPH version 16.2.10-160.el8.x86_6: librados segfault in ework-thread librados.so.2.0.
Summary: CEPH version 16.2.10-160.el8.x86_6: librados segfault in ework-thread libra...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: iSCSI
Version: 5.3
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: 5.3z3
Assignee: Xiubo Li
QA Contact: Preethi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-04-27 18:54 UTC by loberman
Modified: 2023-06-22 12:29 UTC (History)
9 users (show)

Fixed In Version: tcmu-runner-1.5.4-8.el8cp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-05-23 00:19:11 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-6584 0 None None None 2023-04-27 20:19:56 UTC
Red Hat Product Errata RHBA-2023:3259 0 None None None 2023-05-23 00:19:41 UTC

Internal Links: 2196229

Description loberman 2023-04-27 18:54:57 UTC
Description of problem:
Customer upgraded to latest Ceph5.3z to resolve the initial segfault they had in tcmu-runner and is till seeing issues running Ceph and seeing segfaults

The ework-thread segfaults during load tests

Apr 27 08:15:01 esg4wng02202 kernel: [138236.446505] ework-thread[1960297]: segfault at b0 ip 00007fe9401b9194 sp 00007fda61ce49b8 error 4 in librados.so.2.0.0[7fe940107000+1a5000]
Apr 27 08:15:01 esg4wng02202 kernel: [138236.446785] Code: 41 5c e9 3f 30 f8 ff 0f 1f 80 00 00 00 00 5b 5d 41 5c c3 f3 0f 1e fa 48 89 c3 e9 1f a1 f8 ff 90 66 0f 1f 44 00 00 f3 0f 1e fa <83> bf b0 00 00 00 02 75 13 48 81 c7 a8 07 00 00 e9 c7 2f f8 ff 0f

Version-Release number of selected component (if applicable):


How reproducible:
Happens a few hours after load test is started with Oracle RAC
There are no blocklist issues.

Steps to Reproduce:
1.Start the Load test
2.Segfaults happen and then sometimes deadlocks
3.

Actual results:
Fails the test with the ework-thread segfaulting that will then cause deadlocks in the kernel 

Expected results:
Does not segfault

Additional info:
Once again I have struggled to open the core file
I rebuilt Ceph from scratch and installed the debuginfos (whihc now are available) but I had issues

Core file is here
https://people.redhat.com/loberman/customer/.core/

All the package son my debug host are installed so not sure what I am doing wrong there.

[loberman@segstorage3 ~]$ sudo rpm -qa | grep -i tcm
libtcmu-1.5.4-7.el8.x86_64
tcmu-runner-debuginfo-1.5.4-7.el8.x86_64
tcmu-runner-debugsource-1.5.4-7.el8.x86_64
libtcmu-debuginfo-1.5.4-7.el8.x86_64
tcmu-runner-1.5.4-7.el8.x86_64

ceph-common-debuginfo-16.2.10-160.el8.x86_64
ceph-radosgw-debuginfo-16.2.10-160.el8.x86_64
librgw2-debuginfo-16.2.10-160.el8.x86_64
kernel-debuginfo-4.18.0-425.13.1.el8_7.x86_64
ceph-immutable-object-cache-debuginfo-16.2.10-160.el8.x86_64
libcephsqlite-debuginfo-16.2.10-160.el8.x86_64
python3-rbd-debuginfo-16.2.10-160.el8.x86_64
tcmu-runner-debuginfo-1.5.4-7.el8.x86_64
ceph-fuse-debuginfo-16.2.10-160.el8.x86_64
ceph-mon-debuginfo-16.2.10-160.el8.x86_64
libcephfs2-debuginfo-16.2.10-160.el8.x86_64
libradosstriper1-debuginfo-16.2.10-160.el8.x86_64
python3-rados-debuginfo-16.2.10-160.el8.x86_64
rbd-mirror-debuginfo-16.2.10-160.el8.x86_64
elfutils-debuginfod-client-0.187-4.el8.x86_64
cephfs-mirror-debuginfo-16.2.10-160.el8.x86_64
ceph-mgr-debuginfo-16.2.10-160.el8.x86_64
ceph-test-debuginfo-16.2.10-160.el8.x86_64
librados-devel-debuginfo-16.2.10-160.el8.x86_64
python3-cephfs-debuginfo-16.2.10-160.el8.x86_64
rbd-fuse-debuginfo-16.2.10-160.el8.x86_64
libtcmu-debuginfo-1.5.4-7.el8.x86_64
kernel-debuginfo-common-x86_64-4.18.0-425.13.1.el8_7.x86_64
ceph-mds-debuginfo-16.2.10-160.el8.x86_64
librados2-debuginfo-16.2.10-160.el8.x86_64
python3-rgw-debuginfo-16.2.10-160.el8.x86_64
elfutils-debuginfod-client-devel-0.187-4.el8.x86_64
ceph-base-debuginfo-16.2.10-160.el8.x86_64
ceph-osd-debuginfo-16.2.10-160.el8.x86_64
librbd1-debuginfo-16.2.10-160.el8.x86_64
rbd-nbd-debuginfo-16.2.10-160.el8.x86_64




Full log
---------
Apr 27 08:14:59 esg4wng02202 tcmu-runner: tcmu_rbd_close:1240 rbd/iscsirepldata01-esg4wng02201-esg4wng02202.grp-cds4wng1001-1002-VOTE1_LUN03: appended blocklist entry: {192.168.5.35:0/4245534485}
Apr 27 08:14:59 esg4wng02202 tcmu-runner: tcmu_rbd_rm_stale_entries_from_blocklist:382 rbd/iscsirepldata01-esg4wng02201-esg4wng02202.grp-cds4wng1001-1002-VOTE1_LUN03: removing blocklist entry: {192.168.5.35:0/2254410244}
Apr 27 08:15:00 esg4wng02202 tcmu-runner: tcmu_rbd_rm_stale_entries_from_blocklist:382 rbd/iscsirepldata01-esg4wng02201-esg4wng02202.grp-cds4wng1001-1002-VOTE1_LUN03: removing blocklist entry: {192.168.5.35:0/4245534485}
Apr 27 08:15:00 esg4wng02202 tcmu-runner: tcmu_notify_lock_lost:273 rbd/iscsirepldata01-esg4wng02201-esg4wng02202.grp-cds4wng1001-1002-VOTE1_LUN01: Async lock drop. Old state 5
Apr 27 08:15:00 esg4wng02202 tcmu-runner: alua_implicit_transition:592 rbd/iscsirepldata01-esg4wng02201-esg4wng02202.grp-cds4wng1001-1002-VOTE1_LUN01: Starting write lock acquisition operation.
Apr 27 08:15:01 esg4wng02202 tcmu-runner: tcmu_rbd_open:1209 rbd/iscsirepldata01-esg4wng02201-esg4wng02202.grp-cds4wng1001-1002-VOTE1_LUN03: address: {192.168.5.35:0/2863440141}
Apr 27 08:15:01 esg4wng02202 tcmu-runner: tcmu_rbd_close:1240 rbd/iscsirepldata01-esg4wng02201-esg4wng02202.grp-cds4wng1001-1002-VOTE1_LUN01: appended blocklist entry: {192.168.5.35:0/1426901708}
Apr 27 08:15:01 esg4wng02202 tcmu-runner: tcmu_rbd_open:1209 rbd/iscsirepldata01-esg4wng02201-esg4wng02202.grp-cds4wng1001-1002-VOTE1_LUN02: address: {192.168.5.35:0/3447570735}
Apr 27 08:15:01 esg4wng02202 tcmu-runner: tcmu_acquire_dev_lock:488 rbd/iscsirepldata01-esg4wng02201-esg4wng02202.grp-cds4wng1001-1002-VOTE1_LUN03: Read lock acquisition successful
Apr 27 08:15:01 esg4wng02202 tcmu-runner: tcmu_acquire_dev_lock:488 rbd/iscsirepldata01-esg4wng02201-esg4wng02202.grp-cds4wng1001-1002-VOTE1_LUN02: Read lock acquisition successful


Apr 27 08:15:01 esg4wng02202 kernel: [138236.446505] ework-thread[1960297]: segfault at b0 ip 00007fe9401b9194 sp 00007fda61ce49b8 error 4 in librados.so.2.0.0[7fe940107000+1a5000]
Apr 27 08:15:01 esg4wng02202 kernel: [138236.446785] Code: 41 5c e9 3f 30 f8 ff 0f 1f 80 00 00 00 00 5b 5d 41 5c c3 f3 0f 1e fa 48 89 c3 e9 1f a1 f8 ff 90 66 0f 1f 44 00 00 f3 0f 1e fa <83> bf b0 00 00 00 02 75 13 48 81 c7 a8 07 00 00 e9 c7 2f f8 ff 0f

Apr 27 08:16:01 esg4wng02202 kernel: [138296.879907] ABORT_TASK: Found referenced iSCSI task_tag: 70
Apr 27 08:16:08 esg4wng02202 kernel: [138303.778468] ABORT_TASK: Found referenced iSCSI task_tag: 42
Apr 27 08:16:14 esg4wng02202 kernel: [138309.872171] ABORT_TASK: Found referenced iSCSI task_tag: 38
Apr 27 08:16:16 esg4wng02202 kernel: [138311.970428] ABORT_TASK: Found referenced iSCSI task_tag: 55
Apr 27 08:16:33 esg4wng02202 kernel: [138328.977940] iSCSI Login timeout on Network Portal 192.168.55.243:3260
Apr 27 08:16:41 esg4wng02202 kernel: [138336.657922] iSCSI Login timeout on Network Portal 192.168.56.243:3260
Apr 27 08:16:41 esg4wng02202 kernel: [138336.658289] tx_data returned -32, expecting 48.
Apr 27 08:16:41 esg4wng02202 kernel: [138336.658641] iSCSI Login negotiation failed.
Apr 27 08:16:57 esg4wng02202 kernel: [138352.017883] iSCSI Login timeout on Network Portal 192.168.56.243:3260
[root@esg4wng02202 ~]#"


These are all still the older version 16.2.0-117.el8cp in the SOSreport

cephadm-16.2.0-117.el8cp.noarch                             Tue Oct 25 10:03:40 2022
ceph-common-16.2.0-117.el8cp.x86_64                         Tue Oct 25 10:03:44 2022
libcephfs2-16.2.0-117.el8cp.x86_64                          Fri Oct 21 09:58:17 2022
python3-cephfs-16.2.0-117.el8cp.x86_64                      Fri Oct 21 09:58:17 2022
python3-ceph-argparse-16.2.0-117.el8cp.x86_64               Fri Oct 21 09:58:17 2022
python3-ceph-common-16.2.0-117.el8cp.x86_64                 Fri Oct 21 09:58:17 2022


But after upgrade

[loberman@lobefedora sosreport-esg4wng02201-03477154-2023-04-27-ewrzglz]$ cat sos_commands/ceph/ceph_versions 
{
    "mon": {
        "ceph version 16.2.10-160.el8cp (6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 3
    },
    "mgr": {
        "ceph version 16.2.10-160.el8cp (6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 2
    },
    "osd": {
        "ceph version 16.2.10-160.el8cp (6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 96
    },
    "mds": {},
    "tcmu-runner": {
        "ceph version 16.2.10-160.el8cp (6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 32
    },
    "overall": {
        "ceph version 16.2.10-160.el8cp (6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 133
    }
}

Likely everything is upgraded but this is not a good thing for support folks. 
The way the upgrade happens is not via rpm package updates it seems.

Comment 1 loberman 2023-04-27 19:10:33 UTC
Raimund Noticed this. was at the top of my screen and I missed it.
Makes sense why we cannot open the core file.
We will check with the customer


Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/tcmu-runner...Reading symbols from /usr/lib/debug/usr/bin/tcmu-runner-1.5.4-7.el8.x86_64.debug...done.
done.
BFD: warning: /home/loberman/core.tcmu-runner.0.7b56bb3deb88430fa96e13c4fc7ad1ed.1152774.1682583301000000 is truncated: expected core file size >= 4912242688, found: 2147483648

Comment 4 loberman 2023-04-30 14:16:46 UTC
We are still waiting on a valid core from the customer.
The first one was truncated.
Regards
Laurence

Comment 13 loberman 2023-05-05 14:27:36 UTC
I am not sure if I got the backtrace right yet using my test node to debug.
But to me looks like the cluster pointer is NULL

gdb) up
#1  0x00007f7d3bcece7e in rados_service_update_status () from /lib64/librados.so.2
(gdb) up
#2  0x00007f7d3cb3546e in tcmu_rbd_service_status_update (dev=0x41a7600, has_lock=false)
    at /usr/src/debug/tcmu-runner-1.5.4-7.el8.x86_64/rbd.c:137
137		ret = rados_service_update_status(state->cluster, status_buf);

(gdb) p state->cluster
$1 = (rados_t) 0x0


static int tcmu_rbd_service_status_update(struct tcmu_device *dev,
                                          bool has_lock)
{
        struct tcmu_rbd_state *state = tcmur_dev_get_private(dev);
        struct tcmur_device *rdev = tcmu_dev_get_private(dev);
        char *status_buf = NULL;
        int ret;

        ret = asprintf(&status_buf,
                       "%s%c%s%c%s%c%"PRIu64"%c%s%c%"PRIu64"%c%s%c%"PRIu64"%c",
                       "lock_owner", '\0', has_lock ? "true" : "false", '\0',
                       "lock_lost_cnt", '\0', rdev->lock_lost_cnt, '\0',
                       "conn_lost_cnt", '\0', rdev->conn_lost_cnt, '\0',
                       "cmd_timed_out_cnt", '\0', rdev->cmd_timed_out_cnt, '\0');
        if (ret < 0) {
                tcmu_dev_err(dev, "Could not allocate status buf. Service will not be updated.\n");
                return ret;
        }

        ret = rados_service_update_status(state->cluster, status_buf);
        if (ret < 0) {
                tcmu_dev_err(dev, "Could not update service status. (Err %d)\n",
                             ret);
        }

        free(status_buf);
        return ret;
}

Comment 14 loberman 2023-05-05 14:31:29 UTC
tcmur_device: skip reporting events if the device is closed 

After the rbd device being blocklisted it will lost the connection and at the same time will timedout all the IOs. While the tcmu-runner will report a 'lock_lost_cnt' event to ceph cluster. And if during this the device is reopened it will cause the use-after-free bug.

Notes
When reopening the device it will hold the 'rdev->rdev_lock' lock and then clear the 'TCMUR_DEV_FLAG_IS_OPEN' flag. And we can check this flag and just skip reporting events if the device is closed.

So null means device closed ?

need to understand why we were blocklisted though, That should only happen if we have access to the same LUN from two gateways at the same time.

@Raimund, lets get the logs also available so we can check the blocklisting

Comment 15 loberman 2023-05-05 14:35:50 UTC
Chatted with Raimund

Comment 18 Manny 2023-05-05 15:08:20 UTC
Hello @

Comment 19 Manny 2023-05-05 15:11:51 UTC
Hello @rsachere and @loberman ,

FYI, I'd be cautious to use the Block List as anything of value. There was a site down and a Block List entry was thought to be part of the cause and it turned out to be nothing to do about nothing. Raimund, ping HKlein if you want to know more.

Best regards,
Manny

Comment 22 Xiubo Li 2023-05-06 01:22:01 UTC
(In reply to loberman from comment #14)
> tcmur_device: skip reporting events if the device is closed 
> 
> After the rbd device being blocklisted it will lost the connection and at
> the same time will timedout all the IOs. While the tcmu-runner will report a
> 'lock_lost_cnt' event to ceph cluster. And if during this the device is
> reopened it will cause the use-after-free bug.
> 
> Notes
> When reopening the device it will hold the 'rdev->rdev_lock' lock and then
> clear the 'TCMUR_DEV_FLAG_IS_OPEN' flag. And we can check this flag and just
> skip reporting events if the device is closed.
> 
> So null means device closed ?
> 

No, it's the entity of network connection with Rados will be freed after the device is closed.

> need to understand why we were blocklisted though, That should only happen
> if we have access to the same LUN from two gateways at the same time.
> 
> @Raimund, lets get the logs also available so we can check the blocklisting

Yeah, for tcmu-runner there will always be only one active LUN, so when another gateway is trying to access the ceph Rados that gateway will try to acquire the rbd exclusive lock from Rados and then Rados will blocklist current the lock owner. Please make sure the multiple path config is correct in initiator side, currently the ceph-iscsi couldn't do real AA, cu should make sure the path choice policy is ALUA.

Thanks.

Comment 40 Preethi 2023-05-09 03:49:58 UTC
@xiubo Li, Request you to provide the complete test steps to verify the BZ.

Comment 46 Preethi 2023-05-15 16:50:55 UTC
Hi Xiubo Li,

Following are the test steps performed for verifyigng the BZ - No issue was seen during the test.
1, Setup a ceph-iscsi cluster, which should include 2 gateways, and please enable the debug logs in tcmu-runner services in these two gateways
2, In the initiator side just use fio or other test tools write raw IOs to these two '/dev/sd{x}' scsi devices directly at the same time.
3, Run the test in Step2 for minutes or longer time to see whether could you see the crash
4, If you hit any crash again please provide the debug logs.

NOTE - 1) IOs against same image of different gateways path i.e /dev/sda and /dev/sdb were performed for few hours - Issue is not seen
- 2) Ran FIO against random luns for 5-6 hours and no issue were seen
FYI, Debug log was enabled and there was no seg fault/crash or any error noticed in the tcmu logs.

ceph version and tcmu snippet

[root@ceph-amk-upgrade-53-ga-s8g9v2-node1-installer cephuser]# rpm -qa | grep tcmu*
[root@ceph-amk-upgrade-53-ga-s8g9v2-node1-installer cephuser]# ceph version
ceph version 16.2.10-171.el8cp (00a157ecd158911ece116ae43095de793ed9f389) pacific (stable)

[root@ceph-amk-upgrade-53-ga-s8g9v2-node5 /]# rpm -qa | grep tcmu*
tcmu-runner-1.5.4-8.el8cp.x86_64
libtcmu-1.5.4-8.el8cp.x86_64

FIO workload used -  
fio --name=test-1 --numjobs=1 --rw=write/randwrite --bs=4/8/32k --iodepth=8 --fsync=32 --time_based --group_reporting --ioengine=libaio --filename=/dev/sda

Comment 48 Xiubo Li 2023-05-16 00:36:57 UTC
(In reply to Preethi from comment #46)
> Hi Xiubo Li,
> 
> Following are the test steps performed for verifyigng the BZ - No issue was
> seen during the test.
> 1, Setup a ceph-iscsi cluster, which should include 2 gateways, and please
> enable the debug logs in tcmu-runner services in these two gateways
> 2, In the initiator side just use fio or other test tools write raw IOs to
> these two '/dev/sd{x}' scsi devices directly at the same time.
> 3, Run the test in Step2 for minutes or longer time to see whether could you
> see the crash
> 4, If you hit any crash again please provide the debug logs.
> 
> NOTE - 1) IOs against same image of different gateways path i.e /dev/sda and
> /dev/sdb were performed for few hours - Issue is not seen
> - 2) Ran FIO against random luns for 5-6 hours and no issue were seen
> FYI, Debug log was enabled and there was no seg fault/crash or any error
> noticed in the tcmu logs.
> 
> ceph version and tcmu snippet
> 
> [root@ceph-amk-upgrade-53-ga-s8g9v2-node1-installer cephuser]# rpm -qa |
> grep tcmu*
> [root@ceph-amk-upgrade-53-ga-s8g9v2-node1-installer cephuser]# ceph version
> ceph version 16.2.10-171.el8cp (00a157ecd158911ece116ae43095de793ed9f389)
> pacific (stable)
> 
> [root@ceph-amk-upgrade-53-ga-s8g9v2-node5 /]# rpm -qa | grep tcmu*
> tcmu-runner-1.5.4-8.el8cp.x86_64
> libtcmu-1.5.4-8.el8cp.x86_64
> 
> FIO workload used -  
> fio --name=test-1 --numjobs=1 --rw=write/randwrite --bs=4/8/32k --iodepth=8
> --fsync=32 --time_based --group_reporting --ioengine=libaio
> --filename=/dev/sda

Nice work Preeithi. 

Thanks very much for your tests.
- Xiubo

Comment 50 errata-xmlrpc 2023-05-23 00:19:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 5.3 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:3259


Note You need to log in before you can comment on or make changes to this bug.