Bug 1349376

Summary: multipath -ll crashes
Product: Red Hat Enterprise Linux 6 Reporter: nikhil kshirsagar <nkshirsa>
Component: device-mapper-multipathAssignee: Ben Marzinski <bmarzins>
Status: CLOSED ERRATA QA Contact: Lin Li <lilin>
Severity: medium Docs Contact: Milan Navratil <mnavrati>
Priority: unspecified    
Version: 6.5CC: aghadge, agk, bmarzins, heinzm, jdeenada, jjose, lilin, loberman, msnitzer, nkshirsa, prajnoha, rbalakri, wevans, yizhan, zkabelac
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: device-mapper-multipath-0.4.9-94.el6 Doc Type: Bug Fix
Doc Text:
multipath no longer crashes due to *libdevmapper* version mismatches Previously, the multipath code was not linking to the correct libraries during a part of compilation. As a consequence, if *device-mapper-multipath* was used with a newer version of the *libdevmapper* library than it was compiled with, multipath sometimes terminated unexpectedly. Now, multipath correctly links to the necessary libraries during compilation. As a result, multipath no longer crashes due to the library version mismatch.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-21 10:51:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1269194, 1324930    

Description nikhil kshirsagar 2016-06-23 10:51:12 UTC
Description of problem:
 
# multipath -ll
3514f0c5f27e0019c dm-1 XtremIO,XtremApp
size=500G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=1 status=active
  |- 3:0:0:1 sdb 8:16  active ready running
  |- 3:0:1:1 sdc 8:32  active ready running
  |- 3:0:2:1 sdd 8:48  active ready running
  |- 3:0:4:1 sdf 8:80  active ready running
  |- 3:0:3:1 sde 8:64  active ready running
  |- 3:0:5:1 sdg 8:96  active ready running
  |- 3:0:6:1 sdh 8:112 active ready running
  |- 3:0:7:1 sdi 8:128 active ready running
  |- 4:0:0:1 sdj 8:144 active ready running
  |- 4:0:1:1 sdk 8:160 active ready running
  |- 4:0:2:1 sdl 8:176 active ready running
  |- 4:0:3:1 sdm 8:192 active ready running
  |- 4:0:4:1 sdn 8:208 active ready running
  |- 4:0:5:1 sdo 8:224 active ready running
  |- 4:0:6:1 sdp 8:240 active ready running
  `- 4:0:7:1 sdq 65:0  active ready running
*** glibc detected *** multipath: free(): invalid next size (fast): 0x0000000000e12a50 ***
======= Backtrace: =========
/lib64/libc.so.6[0x3e91a75e66]
/lib64/libc.so.6[0x3e91a789b3]
/lib64/libmultipath.so(free_multipath+0x5d)[0x32dda16f5d]
/lib64/libmultipath.so(free_multipathvec+0x4d)[0x32dda1701d]
multipath[0x402195]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3e91a1ed5d]
multipath[0x401b99]


Version-Release number of selected component (if applicable):
device-mapper-multipath-0.4.9-80.el6.x86_64



Additional info:


]# valgrind --tool=memcheck multipath -ll
==7783== Memcheck, a memory error detector
==7783== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==7783== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==7783== Command: multipath -ll
==7783==
Jun 22 09:58:31 | can't set open fds limit to 8192 : Operation not permitted

==7783== Invalid write of size 8
==7783==    at 0x32DD641581: dm_task_get_info (string3.h:85)
==7783==    by 0x32DDA10492: dm_get_info (devmapper.c:1048)
==7783==    by 0x32DDA10E09: dm_get_maps (devmapper.c:803)
==7783==    by 0x402385: main (main.c:171)
==7783==  Address 0x4cbfce8 is 0 bytes after a block of size 40 alloc'd
==7783==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==7783==    by 0x32DDA0E955: zalloc (memory.c:41)
==7783==    by 0x32DDA10459: dm_get_info (devmapper.c:1019)
==7783==    by 0x32DDA10E09: dm_get_maps (devmapper.c:803)
==7783==    by 0x402385: main (main.c:171)
==7783==
==7783== Invalid write of size 4
==7783==    at 0x32DD6415D9: dm_task_get_info (libdm-iface.c:702)
==7783==    by 0x32DDA10492: dm_get_info (devmapper.c:1048)
==7783==    by 0x32DDA10E09: dm_get_maps (devmapper.c:803)
==7783==    by 0x402385: main (main.c:171)
==7783==  Address 0x4cbfcec is 4 bytes after a block of size 40 alloc'd
==7783==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==7783==    by 0x32DDA0E955: zalloc (memory.c:41)
==7783==    by 0x32DDA10459: dm_get_info (devmapper.c:1019)
==7783==    by 0x32DDA10E09: dm_get_maps (devmapper.c:803)
==7783==    by 0x402385: main (main.c:171)
==7783==
==7783== Invalid write of size 4
==7783==    at 0x32DD6415DC: dm_task_get_info (libdm-iface.c:701)
==7783==    by 0x32DDA10492: dm_get_info (devmapper.c:1048)
==7783==    by 0x32DDA10E09: dm_get_maps (devmapper.c:803)
==7783==    by 0x402385: main (main.c:171)
==7783==  Address 0x4cbfce8 is 0 bytes after a block of size 40 alloc'd
==7783==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==7783==    by 0x32DDA0E955: zalloc (memory.c:41)
==7783==    by 0x32DDA10459: dm_get_info (devmapper.c:1019)
==7783==    by 0x32DDA10E09: dm_get_maps (devmapper.c:803)
==7783==    by 0x402385: main (main.c:171)
==7783==
3514f0c5f27e0019c dm-1 XtremIO,XtremApp
size=500G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=1 status=active
  |- 3:0:0:1 sdb 8:16  active ready running
  |- 3:0:1:1 sdc 8:32  active ready running
  |- 3:0:2:1 sdd 8:48  active ready running
  |- 3:0:4:1 sdf 8:80  active ready running
  |- 3:0:3:1 sde 8:64  active ready running
  |- 3:0:5:1 sdg 8:96  active ready running
  |- 3:0:6:1 sdh 8:112 active ready running
  |- 3:0:7:1 sdi 8:128 active ready running
  |- 4:0:0:1 sdj 8:144 active ready running
  |- 4:0:1:1 sdk 8:160 active ready running
  |- 4:0:2:1 sdl 8:176 active ready running
  |- 4:0:3:1 sdm 8:192 active ready running
  |- 4:0:4:1 sdn 8:208 active ready running
  |- 4:0:5:1 sdo 8:224 active ready running
  |- 4:0:6:1 sdp 8:240 active ready running
  `- 4:0:7:1 sdq 65:0  active ready running
==7783==
==7783== HEAP SUMMARY:
==7783==     in use at exit: 5,556 bytes in 17 blocks
==7783==   total heap usage: 1,770 allocs, 1,753 frees, 666,062 bytes allocated
==7783==
==7783== LEAK SUMMARY:
==7783==    definitely lost: 0 bytes in 0 blocks
==7783==    indirectly lost: 0 bytes in 0 blocks
==7783==      possibly lost: 0 bytes in 0 blocks
==7783==    still reachable: 5,556 bytes in 17 blocks
==7783==         suppressed: 0 bytes in 0 blocks
==7783== Rerun with --leak-check=full to see details of leaked memory
==7783==
==7783== For counts of detected and suppressed errors, rerun with: -v
==7783== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 17 from 9)


-----------------------------------------------------------------------------



The core just shows the point at which the system has detected the corruption causing the segmentation fault. The gdb stack trace will not help analyse where the corruption has occurred. 

When you malloc some memory, malloc allocates a little more memory before and after your memory requested, for its internal bookkeeping and memory management. We see the mpp->dmi as a valid structure with a valid address. (non null). Free() has no problems freeing this area. But when you free a chunk of memory, free() checks if the previous chunk is free and whether the following chunk is free. If either of those are free then free() must merge the chunk you just freed with adjacent free chunks. For non free chunks, it needs to be able to figure out how big it is.

This is where free seems to be crashing because the information storing the size of the next chunk has been corrupted. 

(gdb) where
#0  0x0000003e91a32625 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x0000003e91a33e05 in abort () at abort.c:92
#2  0x0000003e91a70537 in __libc_message (do_abort=2, fmt=0x3e91b58780 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:198
#3  0x0000003e91a75e66 in malloc_printerr (action=3, str=0x3e91b58aa8 "free(): invalid next size (fast)", ptr=<value optimized out>) at malloc.c:6336
#4  0x0000003e91a789b3 in _int_free (av=0x3e91d8fe80, p=0xf89a40, have_lock=0) at malloc.c:4832
#5  0x00000032dda16f5d in free_multipath (mpp=0xf832e0, free_paths=0) at structs.c:241
#6  0x00000032dda1701d in free_multipathvec (mpvec=0xf7f1b0, free_paths=0) at structs.c:285
#7  0x0000000000402195 in configure (argc=<value optimized out>, argv=<value optimized out>) at main.c:373
#8  main (argc=<value optimized out>, argv=<value optimized out>) at main.c:586


(gdb) p *mpp->dmi
$4 = {exists = 1, suspended = 0, live_table = 1, inactive_table = 0, open_count = 1, event_nr = 1, major = 253, minor = 1, read_only = 0, target_count = 1}
(gdb) ptype mpp->dmi
type = struct dm_info {
    int exists;
    int suspended;
    int live_table;
    int inactive_table;
    int32_t open_count;
    uint32_t event_nr;
    uint32_t major;
    uint32_t minor;
    int read_only;
    int32_t target_count;
} *
(gdb) 


int dm_task_get_info(struct dm_task *dmt, struct dm_info *info)
{
        if (!dmt->dmi.v4)
                return 0;

        memset(info, 0, sizeof(*info));  

        info->exists = dmt->dmi.v4->flags & DM_EXISTS_FLAG ? 1 : 0;
        if (!info->exists)
                return 1;

        info->suspended = dmt->dmi.v4->flags & DM_SUSPEND_FLAG ? 1 : 0;
        info->read_only = dmt->dmi.v4->flags & DM_READONLY_FLAG ? 1 : 0; 
        info->live_table = dmt->dmi.v4->flags & DM_ACTIVE_PRESENT_FLAG ? 1 : 0;
        info->inactive_table = dmt->dmi.v4->flags & DM_INACTIVE_PRESENT_FLAG ?
            1 : 0;
        info->deferred_remove = dmt->dmi.v4->flags & DM_DEFERRED_REMOVE;
        info->internal_suspend = (dmt->dmi.v4->flags & DM_INTERNAL_SUSPEND_FLAG) ? 1 : 0;
        info->target_count = dmt->dmi.v4->target_count;
        info->open_count = dmt->dmi.v4->open_count;
        info->event_nr = dmt->dmi.v4->event_nr;
        info->major = MAJOR(dmt->dmi.v4->dev);
        info->minor = MINOR(dmt->dmi.v4->dev);

        return 1;
}

Comment 10 Jerrin Jose 2016-06-28 16:05:16 UTC
Customer has updated that, he did upgrade the device-mapper-multipath package using the "yum update device-mapper-multipath" and the issue got resolved.


=========================================================================================================================
 Package                                  Arch               Version                  Repository                    Size
=========================================================================================================================
Updating:
 device-mapper-multipath                  x86_64             0.4.9-93.el6             rhel6-local-repo             128 k
Updating for dependencies:
 device-mapper-multipath-libs             x86_64             0.4.9-93.el6             rhel6-local-repo             198 k
 kpartx                                   x86_64             0.4.9-93.el6             rhel6-local-repo              67 k

Transaction Summary
=========================================================================================================================

Comment 14 Ben Marzinski 2016-10-17 13:56:30 UTC
updated makefile to link to link to the necessary libraries during compilation.

Comment 25 errata-xmlrpc 2017-03-21 10:51:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0697.html