Bug 880121
| Summary: | multipathd crash and prevent daemon restart "multipathd: mpathdx: error getting map status string" | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Gris Ge <fge> | ||||
| Component: | device-mapper-multipath | Assignee: | Ben Marzinski <bmarzins> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Barry Donahue <bdonahue> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 6.4 | CC: | agk, bgoncalv, bmarzins, dwysocha, heinzm, msnitzer, prajnoha, prockai, Rami.Vaknin, rbalakri, yanwang, zkabelac | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | device-mapper-multipath-0.4.9-84.el6 | Doc Type: | Bug Fix | ||||
| Doc Text: | 
       Cause: If multipathd failed to add a multipath device, in some circumstances, it was freeing the alias, and then accessing it and attempting to free it again. 
Consequence: multipathd would crash if it tried to add a multipath device that was too large for it to handle (and far too large to be practical in a real world application)
Fix: multipathd no longer frees the alias twice, or attempts to access the freed alias.
Result: multipathd no longer crashes when it fails to add a multipath device. 
 | 
        
        
        
        Story Points: | --- | ||||
| Clone Of: | |||||||
| : | 1198418 (view as bug list) | Environment: | |||||
| Last Closed: | 2015-07-22 07:25:21 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1198418 | ||||||
| Attachments: | 
            
  | 
      ||||||
Created attachment 651875 [details]
crash dump for device-mapper-multipath-0.4.9-62
    I can 100% reproduce this problem on storage-qe server. Same issue found in RHEL 6.3. Not a regression. This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux. I got the same error message from multipathd while trying to increase the number of iscsi sessions to ~1000. I doubled the number of sessions a few times using: for i in /sys/devices/platform/host*/session* ; do echo $i ; iscsiadm -m session -r $i -o new ; done [root@lg509 ~]# multipath -ll Jan 29 10:25:31 | 3514f0c532e00002b: error getting map status string [root@lg509 ~]# /etc/init.d/multipathd status multipathd dead but pid file exists [root@lg509 ~]# /etc/init.d/multipathd restart ux_socket_connect: Connection refused Stopping multipathd daemon: [FAILED] Starting multipathd daemon: [ OK ] [root@lg509 ~]# multipath -ll Jan 29 10:26:39 | 3514f0c532e00002b: error getting map status string [root@lg509 ~]# Working with 6.5, rpms version: [root@lg509 ~]# uname -r 2.6.32-431.el6.x86_64 [root@lg509 ~]# rpm -qa | grep multipath device-mapper-multipath-0.4.9-80.el6.x86_64 device-mapper-multipath-libs-0.4.9-80.el6.x86_64 [root@lg509 ~]# The cause of this seems pretty likely to be that multipathd can't handle a device-mapper table/status line that big, and instead of failing gracefully, it's failing badly, and multipathd looks to be overwriting memory. It certainly fix the memory corruption. However, I might end up putting a limit on the number of paths that multipath will create in the first place, I can't see any real practical use for 128 paths to a device. I've fixed the memory corruption issue. But like I mentioned earlier, I did not make multipath able to handle arbitrarily large device tables. Multipath will still fail to create tables that are too large. This happens somewhere between 256 and 1024 paths, depending on how the device is configured. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-1391.html *** Bug 895110 has been marked as a duplicate of this bug. ***  | 
Description of problem: When testing multipath over iscsi, multipathd crashed. And it also prevent the daemon restart with this error messages: === multipathd: mpathdx: error getting map status string === The crashdump file is attahced. Version-Release number of selected component (if applicable): device-mapper-multipath-0.4.9-62 How reproducible: only hit it twice. Steps to Reproduce: 1. Create a multipath over iSCSI. 2. Try to clone it's iscsi iface and iscsi node. === for X in `seq 1 129`;do iscsiadm -m iface -o new -I gris_tmp_iface_$X; iscsiadm -m iface -I gris_tmp_iface_$X -o update \ -n iface.initiatorname -v iqn.1994-05.com.redhat:gris-dev-2; iscsiadm -m iface -I gris_tmp_iface_$X -o update \ -n iface.transport_name -v tcp; iscsiadm -m node -T iqn.1992-08.com.netapp:sn.151753773 \ -I gris_tmp_iface_$X -p 10.16.43.127 -o new; iscsiadm -m node -T iqn.1992-08.com.netapp:sn.151753773 \ -I gris_tmp_iface_$X -l; done === 3. waitudev Actual results: multipathd crash Expected results: multipathd not crash. Additional info: The reproduce scripts is not the code I am using when testing, It's just a translate from perl code. The backtrace from gdb: =============================== (gdb) bt #0 0x00007fa4f12db8a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x00007fa4f12dd085 in abort () at abort.c:92 #2 0x00007fa4f13197b7 in __libc_message (do_abort=2, fmt=0x7fa4f1400f80 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:198 #3 0x00007fa4f131f0e6 in malloc_printerr (action=3, str=0x7fa4f14012c0 "double free or corruption (out)", ptr=<value optimized out>) at malloc.c:6311 #4 0x00007fa4f1321c13 in _int_free (av=0x7fa4f1637e80, p=0x7fa4cc06e170, have_lock=0) at malloc.c:4811 #5 0x0000000000408501 in ev_add_map (dev=0x7fa4cc001d10, vecs=<value optimized out>) at main.c:304 #6 0x0000000000408af2 in uev_add_map (uev=0x7fa4e80009f0, trigger_data=0x163df00) at main.c:235 #7 uev_trigger (uev=0x7fa4e80009f0, trigger_data=0x163df00) at main.c:731 #8 0x00007fa4f1c9d214 in service_uevq () at uevent.c:109 #9 0x00007fa4f1c9d2b7 in uevq_thread (et=<value optimized out>) at uevent.c:135 #10 0x00007fa4f1643851 in start_thread (arg=0x7fa4f2911700) at pthread_create.c:301 #11 0x00007fa4f139190d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 ===============================