880121 – multipathd crash and prevent daemon restart "multipathd: mpathdx: error getting map status string"

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 880121 - multipathd crash and prevent daemon restart "multipathd: mpathdx: error getting map status string"

Summary: multipathd crash and prevent daemon restart "multipathd: mpathdx: error getti...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	device-mapper-multipath
Sub Component:
Version:	6.4
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Ben Marzinski
QA Contact:	Barry Donahue
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	895110 (view as bug list)
Depends On:
Blocks:	1198418
TreeView+	depends on / blocked

Reported:	2012-11-26 10:17 UTC by Gris Ge
Modified:	2015-10-14 16:13 UTC (History)
CC List:	12 users (show)
Fixed In Version:	device-mapper-multipath-0.4.9-84.el6
Doc Type:	Bug Fix
Doc Text:	Cause: If multipathd failed to add a multipath device, in some circumstances, it was freeing the alias, and then accessing it and attempting to free it again. Consequence: multipathd would crash if it tried to add a multipath device that was too large for it to handle (and far too large to be practical in a real world application) Fix: multipathd no longer frees the alias twice, or attempts to access the freed alias. Result: multipathd no longer crashes when it fails to add a multipath device.
Clone Of:
Clones:	1198418 (view as bug list)
Environment:
Last Closed:	2015-07-22 07:25:21 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
crash dump for device-mapper-multipath-0.4.9-62 (966.80 KB, application/octet-stream) 2012-11-26 10:24 UTC, Gris Ge	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2015:1391	0	normal	SHIPPED_LIVE	device-mapper-multipath bug fix and enhancement update	2015-07-20 18:07:34 UTC

Description Gris Ge 2012-11-26 10:17:33 UTC

Description of problem:
When testing multipath over iscsi, multipathd crashed. And it also prevent the daemon restart with this error messages:
===
multipathd: mpathdx: error getting map status string
===

The crashdump file is attahced.

Version-Release number of selected component (if applicable):
device-mapper-multipath-0.4.9-62

How reproducible:
only hit it twice.

Steps to Reproduce:
1. Create a multipath over iSCSI.
2. Try to clone it's iscsi iface and iscsi node.
===
for X in `seq 1 129`;do 
    iscsiadm -m iface -o new -I gris_tmp_iface_$X;
    iscsiadm -m iface -I gris_tmp_iface_$X -o update \
      -n iface.initiatorname -v iqn.1994-05.com.redhat:gris-dev-2;
    iscsiadm -m iface -I gris_tmp_iface_$X -o update \
      -n iface.transport_name -v tcp;

    iscsiadm -m node -T iqn.1992-08.com.netapp:sn.151753773 \
      -I gris_tmp_iface_$X -p 10.16.43.127 -o new;
    iscsiadm -m node -T iqn.1992-08.com.netapp:sn.151753773 \
      -I gris_tmp_iface_$X -l;
done
===
3. waitudev
  
Actual results:
multipathd crash

Expected results:
multipathd not crash.

Additional info:

The reproduce scripts is not the code I am using when testing, It's just a translate from perl code.

The backtrace from gdb:
===============================
(gdb) bt
#0  0x00007fa4f12db8a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007fa4f12dd085 in abort () at abort.c:92
#2  0x00007fa4f13197b7 in __libc_message (do_abort=2, 
    fmt=0x7fa4f1400f80 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:198
#3  0x00007fa4f131f0e6 in malloc_printerr (action=3, str=0x7fa4f14012c0 "double free or corruption (out)", 
    ptr=<value optimized out>) at malloc.c:6311
#4  0x00007fa4f1321c13 in _int_free (av=0x7fa4f1637e80, p=0x7fa4cc06e170, have_lock=0) at malloc.c:4811
#5  0x0000000000408501 in ev_add_map (dev=0x7fa4cc001d10, vecs=<value optimized out>) at main.c:304
#6  0x0000000000408af2 in uev_add_map (uev=0x7fa4e80009f0, trigger_data=0x163df00) at main.c:235
#7  uev_trigger (uev=0x7fa4e80009f0, trigger_data=0x163df00) at main.c:731
#8  0x00007fa4f1c9d214 in service_uevq () at uevent.c:109
#9  0x00007fa4f1c9d2b7 in uevq_thread (et=<value optimized out>) at uevent.c:135
#10 0x00007fa4f1643851 in start_thread (arg=0x7fa4f2911700) at pthread_create.c:301
#11 0x00007fa4f139190d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
===============================

Comment 1 Gris Ge 2012-11-26 10:24:26 UTC

Created attachment 651875 [details]
crash dump for device-mapper-multipath-0.4.9-62

Comment 2 Gris Ge 2012-11-26 10:26:28 UTC

I can 100% reproduce this problem on storage-qe server.

Comment 4 Gris Ge 2012-11-27 02:44:28 UTC

Same issue found in RHEL 6.3. Not a regression.

Comment 5 RHEL Program Management 2012-12-14 07:24:46 UTC

This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 6 Rami Vaknin 2015-01-29 08:54:53 UTC

I got the same error message from multipathd while trying to increase the number of iscsi sessions to ~1000.

I doubled the number of sessions a few times using:
for i in  /sys/devices/platform/host*/session* ; do echo $i ; iscsiadm -m session -r $i -o new ; done


[root@lg509 ~]# multipath -ll
Jan 29 10:25:31 | 3514f0c532e00002b: error getting map status string
[root@lg509 ~]# /etc/init.d/multipathd status
multipathd dead but pid file exists
[root@lg509 ~]# /etc/init.d/multipathd restart
ux_socket_connect: Connection refused
Stopping multipathd daemon:                                [FAILED]
Starting multipathd daemon:                                [  OK  ]
[root@lg509 ~]# multipath -ll
Jan 29 10:26:39 | 3514f0c532e00002b: error getting map status string
[root@lg509 ~]#



Working with 6.5, rpms version:

[root@lg509 ~]# uname -r
2.6.32-431.el6.x86_64
[root@lg509 ~]# rpm -qa | grep multipath
device-mapper-multipath-0.4.9-80.el6.x86_64
device-mapper-multipath-libs-0.4.9-80.el6.x86_64
[root@lg509 ~]#

Comment 7 Ben Marzinski 2015-01-30 05:46:15 UTC

The cause of this seems pretty likely to be that multipathd can't handle a device-mapper table/status line that big, and instead of failing gracefully, it's failing badly, and multipathd looks to be overwriting memory.  It certainly fix the memory corruption.  However, I might end up putting a limit on the number of paths that multipath will create in the first place, I can't see any real practical use for 128 paths to a device.

Comment 8 Ben Marzinski 2015-02-19 18:24:01 UTC

I've fixed the memory corruption issue.  But like I mentioned earlier, I did not make multipath able to handle arbitrarily large device tables.  Multipath will still fail to create tables that are too large. This happens somewhere between 256 and 1024 paths, depending on how the device is configured.

Comment 12 errata-xmlrpc 2015-07-22 07:25:21 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1391.html

Comment 13 Ben Marzinski 2015-10-14 16:13:04 UTC

*** Bug 895110 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.