Bug 880121

Summary:

multipathd crash and prevent daemon restart "multipathd: mpathdx: error getting map status string"

Product:

Red Hat Enterprise Linux 6

Reporter:

Gris Ge <fge>

Component:

device-mapper-multipath

Assignee:

Ben Marzinski <bmarzins>

Status:

CLOSED ERRATA

QA Contact:

Barry Donahue <bdonahue>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

6.4

CC:

agk, bgoncalv, bmarzins, dwysocha, heinzm, msnitzer, prajnoha, prockai, Rami.Vaknin, rbalakri, yanwang, zkabelac

Target Milestone:

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

device-mapper-multipath-0.4.9-84.el6

Doc Type:

Bug Fix

Doc Text:

Cause: If multipathd failed to add a multipath device, in some circumstances, it was freeing the alias, and then accessing it and attempting to free it again. Consequence: multipathd would crash if it tried to add a multipath device that was too large for it to handle (and far too large to be practical in a real world application) Fix: multipathd no longer frees the alias twice, or attempts to access the freed alias. Result: multipathd no longer crashes when it fails to add a multipath device.

Story Points:

---

Clone Of:

Clones:

1198418 (view as bug list)

Environment:

Last Closed:

2015-07-22 07:25:21 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1198418

Attachments:

Description	Flags
crash dump for device-mapper-multipath-0.4.9-62	none

Description Gris Ge 2012-11-26 10:17:33 UTC

Description of problem:
When testing multipath over iscsi, multipathd crashed. And it also prevent the daemon restart with this error messages:
===
multipathd: mpathdx: error getting map status string
===

The crashdump file is attahced.

Version-Release number of selected component (if applicable):
device-mapper-multipath-0.4.9-62

How reproducible:
only hit it twice.

Steps to Reproduce:
1. Create a multipath over iSCSI.
2. Try to clone it's iscsi iface and iscsi node.
===
for X in `seq 1 129`;do 
    iscsiadm -m iface -o new -I gris_tmp_iface_$X;
    iscsiadm -m iface -I gris_tmp_iface_$X -o update \
      -n iface.initiatorname -v iqn.1994-05.com.redhat:gris-dev-2;
    iscsiadm -m iface -I gris_tmp_iface_$X -o update \
      -n iface.transport_name -v tcp;

    iscsiadm -m node -T iqn.1992-08.com.netapp:sn.151753773 \
      -I gris_tmp_iface_$X -p 10.16.43.127 -o new;
    iscsiadm -m node -T iqn.1992-08.com.netapp:sn.151753773 \
      -I gris_tmp_iface_$X -l;
done
===
3. waitudev
  
Actual results:
multipathd crash

Expected results:
multipathd not crash.

Additional info:

The reproduce scripts is not the code I am using when testing, It's just a translate from perl code.

The backtrace from gdb:
===============================
(gdb) bt
#0  0x00007fa4f12db8a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007fa4f12dd085 in abort () at abort.c:92
#2  0x00007fa4f13197b7 in __libc_message (do_abort=2, 
    fmt=0x7fa4f1400f80 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:198
#3  0x00007fa4f131f0e6 in malloc_printerr (action=3, str=0x7fa4f14012c0 "double free or corruption (out)", 
    ptr=<value optimized out>) at malloc.c:6311
#4  0x00007fa4f1321c13 in _int_free (av=0x7fa4f1637e80, p=0x7fa4cc06e170, have_lock=0) at malloc.c:4811
#5  0x0000000000408501 in ev_add_map (dev=0x7fa4cc001d10, vecs=<value optimized out>) at main.c:304
#6  0x0000000000408af2 in uev_add_map (uev=0x7fa4e80009f0, trigger_data=0x163df00) at main.c:235
#7  uev_trigger (uev=0x7fa4e80009f0, trigger_data=0x163df00) at main.c:731
#8  0x00007fa4f1c9d214 in service_uevq () at uevent.c:109
#9  0x00007fa4f1c9d2b7 in uevq_thread (et=<value optimized out>) at uevent.c:135
#10 0x00007fa4f1643851 in start_thread (arg=0x7fa4f2911700) at pthread_create.c:301
#11 0x00007fa4f139190d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
===============================

Comment 1 Gris Ge 2012-11-26 10:24:26 UTC

Created attachment 651875 [details]
crash dump for device-mapper-multipath-0.4.9-62

Comment 2 Gris Ge 2012-11-26 10:26:28 UTC

I can 100% reproduce this problem on storage-qe server.

Comment 4 Gris Ge 2012-11-27 02:44:28 UTC

Same issue found in RHEL 6.3. Not a regression.

Comment 5 RHEL Program Management 2012-12-14 07:24:46 UTC

This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 6 Rami Vaknin 2015-01-29 08:54:53 UTC

I got the same error message from multipathd while trying to increase the number of iscsi sessions to ~1000.

I doubled the number of sessions a few times using:
for i in  /sys/devices/platform/host*/session* ; do echo $i ; iscsiadm -m session -r $i -o new ; done


[root@lg509 ~]# multipath -ll
Jan 29 10:25:31 | 3514f0c532e00002b: error getting map status string
[root@lg509 ~]# /etc/init.d/multipathd status
multipathd dead but pid file exists
[root@lg509 ~]# /etc/init.d/multipathd restart
ux_socket_connect: Connection refused
Stopping multipathd daemon:                                [FAILED]
Starting multipathd daemon:                                [  OK  ]
[root@lg509 ~]# multipath -ll
Jan 29 10:26:39 | 3514f0c532e00002b: error getting map status string
[root@lg509 ~]#



Working with 6.5, rpms version:

[root@lg509 ~]# uname -r
2.6.32-431.el6.x86_64
[root@lg509 ~]# rpm -qa | grep multipath
device-mapper-multipath-0.4.9-80.el6.x86_64
device-mapper-multipath-libs-0.4.9-80.el6.x86_64
[root@lg509 ~]#

Comment 7 Ben Marzinski 2015-01-30 05:46:15 UTC

The cause of this seems pretty likely to be that multipathd can't handle a device-mapper table/status line that big, and instead of failing gracefully, it's failing badly, and multipathd looks to be overwriting memory.  It certainly fix the memory corruption.  However, I might end up putting a limit on the number of paths that multipath will create in the first place, I can't see any real practical use for 128 paths to a device.

Comment 8 Ben Marzinski 2015-02-19 18:24:01 UTC

I've fixed the memory corruption issue.  But like I mentioned earlier, I did not make multipath able to handle arbitrarily large device tables.  Multipath will still fail to create tables that are too large. This happens somewhere between 256 and 1024 paths, depending on how the device is configured.

Comment 12 errata-xmlrpc 2015-07-22 07:25:21 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1391.html

Comment 13 Ben Marzinski 2015-10-14 16:13:04 UTC

*** Bug 895110 has been marked as a duplicate of this bug. ***