RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1442369 - Multipathd crashes
Summary: Multipathd crashes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: device-mapper-multipath
Version: 6.8
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Ben Marzinski
QA Contact: Lin Li
URL:
Whiteboard:
Depends On:
Blocks: 1444194 1461138 1507140
TreeView+ depends on / blocked
 
Reported: 2017-04-14 10:31 UTC by Anandhakannan Subramanian
Modified: 2020-08-13 09:03 UTC (History)
12 users (show)

Fixed In Version: device-mapper-multipath-0.4.9-103.el6
Doc Type: Bug Fix
Doc Text:
Cause: The code to check if a dm device was a partition of a multipath device gave the incorrect answer for any devices whose table contained a device with a minor number that was the same as the multipath device minor number with additional digits on the end. Consequence: When removing or renaming a multipath device, multipath could recursively check a device over and over again for, thinking it was a partition of itself, and eventually run out of memory and crash. Fix: Multipath's partition device check is much more robust. Result: multipath correctly identifies the what dm devices are partitions of other devices, and will rename or remove them without crashing.
Clone Of:
: 1444194 (view as bug list)
Environment:
Last Closed: 2018-06-19 05:17:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:1893 0 None None None 2018-06-19 05:19:13 UTC

Description Anandhakannan Subramanian 2017-04-14 10:31:47 UTC
Description of problem:

We suspect regression of https://bugzilla.redhat.com/show_bug.cgi?id=1349376

Program terminated with signal 11, Segmentation fault.
#0  dm_rename_partmaps (old=0x46542fc "mpathhzyp4", new=0x7ffc0b953070 "SCP-EBSINT-BAIE09-PUR-ERPIC-24Lp4") at devmapper.c:1113
1113		if (!(dmt = dm_task_create(DM_DEVICE_LIST)))


Version-Release number of selected component (if applicable):

device-mapper-1.02.117-12.el6.x86_64                        Tue Mar 28 15:15:26 2017
device-mapper-event-1.02.117-12.el6.x86_64                  Tue Mar 28 15:15:26 2017
device-mapper-event-libs-1.02.117-12.el6.x86_64             Tue Mar 28 15:15:26 2017
device-mapper-libs-1.02.117-12.el6.x86_64                   Tue Mar 28 15:15:26 2017
device-mapper-multipath-0.4.9-100.el6.x86_64                Tue Mar 28 15:16:16 2017
device-mapper-multipath-libs-0.4.9-100.el6.x86_64           Tue Mar 28 15:15:27 2017
device-mapper-persistent-data-0.6.2-0.1.rc7.el6.x86_64      Mon Jul 11 10:47:31 2016


Expected results:

multipathd should not crash

Additional info:

(gdb) bt
#0  dm_rename_partmaps (old=0x46542fc "mpathhzyp4", new=0x7ffc0b953070 "SCP-EBSINT-BAIE09-PUR-ERPIC-24Lp4")
    at devmapper.c:1113
#1  0x0000003c79c12b85 in dm_rename (old=0x46542fc "mpathhzyp4", 
    new=0x7ffc0b953070 "SCP-EBSINT-BAIE09-PUR-ERPIC-24Lp4", skip_kpartx=<value optimized out>) at devmapper.c:1185
#2  0x0000003c79c12b25 in dm_rename_partmaps (old=0x465020c "mpathhzyp4", 
    new=0x7ffc0b954150 "SCP-EBSINT-BAIE09-PUR-ERPIC-24Lp4") at devmapper.c:1161
#3  0x0000003c79c12b85 in dm_rename (old=0x465020c "mpathhzyp4", 
    new=0x7ffc0b954150 "SCP-EBSINT-BAIE09-PUR-ERPIC-24Lp4", skip_kpartx=<value optimized out>) at devmapper.c:1185
#4  0x0000003c79c12b25 in dm_rename_partmaps (old=0x464c11c "mpathhzyp4", 
    new=0x7ffc0b955230 "SCP-EBSINT-BAIE09-PUR-ERPIC-24Lp4") at devmapper.c:1161

Comment 3 Ben Marzinski 2017-04-17 20:10:30 UTC
I'm not so sure that this is a regression of Bug 1349376, although that was my first guess as well. Looking at the symbols in libmultipath.so (dm_rename_partmaps is in libmultipath, and multipath and multipathd do not call dm_task_create directly, only in libmultipath does) I see:

** This is using device-mapper-multipath-debuginfo-0.4.9-100.el6.x86_64.rpm
[bmarzins@octiron lib64]$ nm libmultipath.so.debug | grep dm_task_create
                 U dm_task_create@@Base

So, it is versioned. Looking a dmsetup, to see that the symbol versioning matches, I see

** This is using lvm2-debuginfo-2.02.143-12.el6.x86_64.rpm which has the debug symbols for device-mapper-1.02.117-12.el6.x86_64.rpm
[bmarzins@octiron sbin]$ nm lvm.debug | grep dm_task_create
                 U dm_task_create@@Base

So the symbol versioning matches.  On the other hand, I can't easily see how you can get a segfault on this line:

1113		if (!(dmt = dm_task_create(DM_DEVICE_LIST)))

assuming that really is where the segfault happened. Could you possibly post a core dump for me to look at?

Comment 5 Ben Marzinski 2017-04-20 01:23:46 UTC
Well, I know what's going on, and it's pretty impressive that this bug hasn't been hit before.  If you look at the 4 backtrace lines you posted, you can see that they are calling the same two functions over and over again. That's because multipath is stuck in an infinite recursion.

Here is the top of the stack, almost 6000 frames up

#5817 0x0000003c79c12b85 in dm_rename (old=0x183b6fc "mpathhzyp4", 
    new=0x7ffc0c54e0f0 "SCP-EBSINT-BAIE09-PUR-ERPIC-24Lp4", 
    skip_kpartx=<value optimized out>) at devmapper.c:1185
#5818 0x0000003c79c12b25 in dm_rename_partmaps (old=0x18376ec "mpathhzyp4", 
    new=0x7ffc0c54f1d0 "SCP-EBSINT-BAIE09-PUR-ERPIC-24Lp4") at devmapper.c:1161
#5819 0x0000003c79c12b85 in dm_rename (old=0x18376ec "mpathhzyp4", 
    new=0x7ffc0c54f1d0 "SCP-EBSINT-BAIE09-PUR-ERPIC-24Lp4", 
    skip_kpartx=<value optimized out>) at devmapper.c:1185
#5820 0x0000003c79c12b25 in dm_rename_partmaps (old=0x18352a0 "mpathhzy", 
    new=0x15f0ec0 "SCP-EBSINT-BAIE09-PUR-ERPIC-24L") at devmapper.c:1161
#5821 0x0000003c79c12b85 in dm_rename (old=0x18352a0 "mpathhzy", 
    new=0x15f0ec0 "SCP-EBSINT-BAIE09-PUR-ERPIC-24L", 
    skip_kpartx=<value optimized out>) at devmapper.c:1185
#5822 0x0000003c79c313c3 in domap (mpp=0x1835220) at configure.c:643
#5823 0x0000003c79c3215b in coalesce_paths (vecs=0x15fb840, newmp=0x15fbbf0, 
    refwwid=0x0, force_reload=1) at configure.c:810
#5824 0x0000000000406390 in configure (vecs=0x15fb840, start_waiters=1)
    at main.c:1458
#5825 0x0000000000406dc6 in child (argc=<value optimized out>, 
    argv=<value optimized out>) at main.c:1766
#5826 main (argc=<value optimized out>, argv=<value optimized out>)
    at main.c:1992

What's happening here is that multipath is trying to figure out what partitions are on top of the multipath device.  The multipath device mpathhzy is 253:100

(gdb) frame 5820
(gdb) print dev_t
$3 = "253:100

and the kpartx device is 253:10

(gdb) frame 2
$4 = "253:10"

This is the cause of all the problems.  When multipath tries to determine if a kpartx partition device belongs to a multipath device, it makes sure that the multipath part of their dm uuid is the same, and then it checks if the multipath device (253:100) appears in the kpartx device table. If so, it renames the kpartx device as well.  The problem is that it does so by recursively calling dm_rename, which tries to see if there are any kpartx device that are using this device in their device table (253:10).  The way it checks is strstr(<table_string>, <device>).  The problem is that when it is checking for 253:10, if it finds 253:100, strstr will match that, which means that it's own table will match the check to see if it is a partition of itself, which causes an endless recursion. So, clearly multipath needs smarter code to find the kpartx partitions of a multipath device.

This bug exists in RHEL7 and upstream as well.

Comment 19 errata-xmlrpc 2018-06-19 05:17:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1893


Note You need to log in before you can comment on or make changes to this bug.