Bug 880121 - multipathd crash and prevent daemon restart "multipathd: mpathdx: error getting map status string"
Summary: multipathd crash and prevent daemon restart "multipathd: mpathdx: error getti...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: device-mapper-multipath
Version: 6.4
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Ben Marzinski
QA Contact: Barry Donahue
URL:
Whiteboard:
: 895110 (view as bug list)
Depends On:
Blocks: 1198418
TreeView+ depends on / blocked
 
Reported: 2012-11-26 10:17 UTC by Gris Ge
Modified: 2015-10-14 16:13 UTC (History)
12 users (show)

Fixed In Version: device-mapper-multipath-0.4.9-84.el6
Doc Type: Bug Fix
Doc Text:
Cause: If multipathd failed to add a multipath device, in some circumstances, it was freeing the alias, and then accessing it and attempting to free it again. Consequence: multipathd would crash if it tried to add a multipath device that was too large for it to handle (and far too large to be practical in a real world application) Fix: multipathd no longer frees the alias twice, or attempts to access the freed alias. Result: multipathd no longer crashes when it fails to add a multipath device.
Clone Of:
: 1198418 (view as bug list)
Environment:
Last Closed: 2015-07-22 07:25:21 UTC
Target Upstream Version:


Attachments (Terms of Use)
crash dump for device-mapper-multipath-0.4.9-62 (966.80 KB, application/octet-stream)
2012-11-26 10:24 UTC, Gris Ge
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:1391 normal SHIPPED_LIVE device-mapper-multipath bug fix and enhancement update 2015-07-20 18:07:34 UTC

Description Gris Ge 2012-11-26 10:17:33 UTC
Description of problem:
When testing multipath over iscsi, multipathd crashed. And it also prevent the daemon restart with this error messages:
===
multipathd: mpathdx: error getting map status string
===

The crashdump file is attahced.

Version-Release number of selected component (if applicable):
device-mapper-multipath-0.4.9-62

How reproducible:
only hit it twice.

Steps to Reproduce:
1. Create a multipath over iSCSI.
2. Try to clone it's iscsi iface and iscsi node.
===
for X in `seq 1 129`;do 
    iscsiadm -m iface -o new -I gris_tmp_iface_$X;
    iscsiadm -m iface -I gris_tmp_iface_$X -o update \
      -n iface.initiatorname -v iqn.1994-05.com.redhat:gris-dev-2;
    iscsiadm -m iface -I gris_tmp_iface_$X -o update \
      -n iface.transport_name -v tcp;

    iscsiadm -m node -T iqn.1992-08.com.netapp:sn.151753773 \
      -I gris_tmp_iface_$X -p 10.16.43.127 -o new;
    iscsiadm -m node -T iqn.1992-08.com.netapp:sn.151753773 \
      -I gris_tmp_iface_$X -l;
done
===
3. waitudev
  
Actual results:
multipathd crash

Expected results:
multipathd not crash.

Additional info:

The reproduce scripts is not the code I am using when testing, It's just a translate from perl code.

The backtrace from gdb:
===============================
(gdb) bt
#0  0x00007fa4f12db8a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007fa4f12dd085 in abort () at abort.c:92
#2  0x00007fa4f13197b7 in __libc_message (do_abort=2, 
    fmt=0x7fa4f1400f80 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:198
#3  0x00007fa4f131f0e6 in malloc_printerr (action=3, str=0x7fa4f14012c0 "double free or corruption (out)", 
    ptr=<value optimized out>) at malloc.c:6311
#4  0x00007fa4f1321c13 in _int_free (av=0x7fa4f1637e80, p=0x7fa4cc06e170, have_lock=0) at malloc.c:4811
#5  0x0000000000408501 in ev_add_map (dev=0x7fa4cc001d10, vecs=<value optimized out>) at main.c:304
#6  0x0000000000408af2 in uev_add_map (uev=0x7fa4e80009f0, trigger_data=0x163df00) at main.c:235
#7  uev_trigger (uev=0x7fa4e80009f0, trigger_data=0x163df00) at main.c:731
#8  0x00007fa4f1c9d214 in service_uevq () at uevent.c:109
#9  0x00007fa4f1c9d2b7 in uevq_thread (et=<value optimized out>) at uevent.c:135
#10 0x00007fa4f1643851 in start_thread (arg=0x7fa4f2911700) at pthread_create.c:301
#11 0x00007fa4f139190d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
===============================

Comment 1 Gris Ge 2012-11-26 10:24:26 UTC
Created attachment 651875 [details]
crash dump for device-mapper-multipath-0.4.9-62

Comment 2 Gris Ge 2012-11-26 10:26:28 UTC
I can 100% reproduce this problem on storage-qe server.

Comment 4 Gris Ge 2012-11-27 02:44:28 UTC
Same issue found in RHEL 6.3. Not a regression.

Comment 5 RHEL Product and Program Management 2012-12-14 07:24:46 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 6 Rami Vaknin 2015-01-29 08:54:53 UTC
I got the same error message from multipathd while trying to increase the number of iscsi sessions to ~1000.

I doubled the number of sessions a few times using:
for i in  /sys/devices/platform/host*/session* ; do echo $i ; iscsiadm -m session -r $i -o new ; done


[root@lg509 ~]# multipath -ll
Jan 29 10:25:31 | 3514f0c532e00002b: error getting map status string
[root@lg509 ~]# /etc/init.d/multipathd status
multipathd dead but pid file exists
[root@lg509 ~]# /etc/init.d/multipathd restart
ux_socket_connect: Connection refused
Stopping multipathd daemon:                                [FAILED]
Starting multipathd daemon:                                [  OK  ]
[root@lg509 ~]# multipath -ll
Jan 29 10:26:39 | 3514f0c532e00002b: error getting map status string
[root@lg509 ~]#



Working with 6.5, rpms version:

[root@lg509 ~]# uname -r
2.6.32-431.el6.x86_64
[root@lg509 ~]# rpm -qa | grep multipath
device-mapper-multipath-0.4.9-80.el6.x86_64
device-mapper-multipath-libs-0.4.9-80.el6.x86_64
[root@lg509 ~]#

Comment 7 Ben Marzinski 2015-01-30 05:46:15 UTC
The cause of this seems pretty likely to be that multipathd can't handle a device-mapper table/status line that big, and instead of failing gracefully, it's failing badly, and multipathd looks to be overwriting memory.  It certainly fix the memory corruption.  However, I might end up putting a limit on the number of paths that multipath will create in the first place, I can't see any real practical use for 128 paths to a device.

Comment 8 Ben Marzinski 2015-02-19 18:24:01 UTC
I've fixed the memory corruption issue.  But like I mentioned earlier, I did not make multipath able to handle arbitrarily large device tables.  Multipath will still fail to create tables that are too large. This happens somewhere between 256 and 1024 paths, depending on how the device is configured.

Comment 12 errata-xmlrpc 2015-07-22 07:25:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1391.html

Comment 13 Ben Marzinski 2015-10-14 16:13:04 UTC
*** Bug 895110 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.