Bug 1171862 - Customer is using libmpathpersist and since updating to device-mapper-multipath-0.4.9-80.el6_6.1 is experiencing application coredumps
Summary: Customer is using libmpathpersist and since updating to device-mapper-multipa...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: device-mapper-multipath
Version: 6.6
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: ---
Assignee: Ben Marzinski
QA Contact: yanfu,wang
URL:
Whiteboard:
Depends On:
Blocks: 1187217
TreeView+ depends on / blocked
 
Reported: 2014-12-08 19:39 UTC by loberman
Modified: 2019-03-22 07:27 UTC (History)
16 users (show)

Fixed In Version: device-mapper-multipath-0.4.9-83.el6
Doc Type: Bug Fix
Doc Text:
Previously, the libmultipath utility was keeping a global cache of sysfs data for all programs, even though this was only necessary for the multipathd daemon. As a consequence, a memory error could occur when multiple threads were using libmultipath without locking. This led to unexpected termination of multithreaded programs using the mpath_persistent_reserve_in() or mpath_persistent_reserve_out() functions. With this update, only multipathd uses the global sysfs data cache, and the described crashes are thus avoided.
Clone Of:
: 1187217 (view as bug list)
Environment:
Last Closed: 2015-07-22 07:26:50 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:1391 normal SHIPPED_LIVE device-mapper-multipath bug fix and enhancement update 2015-07-20 18:07:34 UTC

Description loberman 2014-12-08 19:39:53 UTC
Description of problem:
This customer develops their own applications against device-mapper API'S and in particular is using mpath_persistent_reserve_in ().
Their application is multithreaded and spawns parallel processes to make this call to the list of mpath devices. 
Since updating to device-mapper-multipath-0.4.9-80.el6_6.1 they have been experiencing core dumps in the application

Version-Release number of selected component (if applicable):
device-mapper-multipath-0.4.9-80.el6_6.1

How reproducible:
Customer runs their application (ftmounter) and this application segfaults.

Steps to Reproduce:
1. Start up the custom application
2. After a while application may coredump


Actual results:
Application SEGfauilts

Expected results:
Application runs cleanly like it did on device-mapper-multipath-0.4.9-72.el6_5.2

Additional info:
Customer believes changes in libmultipath may be the catalyst here

Customer application
[loberman@dhcp-33-21 01285987]$ ldd ftmounter
ldd: warning: you do not have execution permission for `./ftmounter'
	linux-vdso.so.1 =>  (0x00007fffa93ab000)
	libLXplat.so => not found
	libLXftenv.so => not found
	libthread.so => not found
	libsgutils2.so.2 => /usr/lib64/libsgutils2.so.2 (0x0000003763a00000)
	libmpathpersist.so.0 => /lib64/libmpathpersist.so.0 (0x0000003761e00000)
	libmultipath.so => /lib64/libmultipath.so (0x0000003761200000)
	libsysfs.so.2 => not found
	libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x000000376b200000)
	libm.so.6 => /lib64/libm.so.6 (0x00007ff0ff5f8000)
	libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x000000376ae00000)
	libc.so.6 => /lib64/libc.so.6 (0x0000003760e00000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003761600000)
	libdevmapper.so.1.02 => /lib64/libdevmapper.so.1.02 (0x000000376f600000)
	libdl.so.2 => /lib64/libdl.so.2 (0x0000003761a00000)
	/lib64/ld-linux-x86-64.so.2 (0x0000003760a00000)
	libselinux.so.1 => /lib64/libselinux.so.1 (0x0000003762a00000)
	libsepol.so.1 => /lib64/libsepol.so.1 (0x000000376ea00000)
	libudev.so.0 => /lib64/libudev.so.0 (0x00007ff0ff3e7000)

Customer captured a GDB trace
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-75.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /root/ftmounter...done.
[New Thread 25945]
[New Thread 25936]
[New Thread 25938]
[New Thread 25942]
[New Thread 25935]
[New Thread 25940]
[New Thread 16505]
[New Thread 25944]
[New Thread 25946]
Reading symbols from /sn/lib/64/libLXplat.so...done.
Loaded symbols for /sn/lib/64/libLXplat.so
Reading symbols from /sn/lib/64/libLXftenv.so...done.
Loaded symbols for /sn/lib/64/libLXftenv.so
Reading symbols from /sn/lib/64/libthread.so...done.
Loaded symbols for /sn/lib/64/libthread.so
Reading symbols from /usr/lib64/libsgutils2.so.2...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libsgutils2.so.2
Reading symbols from /lib64/libmpathpersist.so.0...Reading symbols from /usr/lib/debug/lib64/libmpathpersist.so.0.debug...done.
done.
Loaded symbols for /lib64/libmpathpersist.so.0
Reading symbols from /lib64/libmultipath.so...Reading symbols from /usr/lib/debug/lib64/libmultipath.so.debug...done.
done.
Loaded symbols for /lib64/libmultipath.so
Reading symbols from /usr/lib64/libsysfs.so.2...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libsysfs.so.2
Reading symbols from /usr/lib64/libstdc++.so.6...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libstdc++.so.6
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libgcc_s.so.1
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libcrypt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/librt.so.1
Reading symbols from /lib64/libdevmapper.so.1.02...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdevmapper.so.1.02
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libfreebl3.so...(no debugging symbols found)...done.
Loaded symbols for /lib64/libfreebl3.so
Reading symbols from /lib64/libselinux.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libselinux.so.1
Reading symbols from /lib64/libsepol.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libsepol.so.1
Reading symbols from /lib64/libudev.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib64/libudev.so.0
Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnss_files.so.2
Reading symbols from /lib64/multipath/libcheckrdac.so...Reading symbols from /usr/lib/debug/lib64/multipath/libcheckrdac.so.debug...done.
done.
Loaded symbols for /lib64/multipath/libcheckrdac.so
Core was generated by `FTMOUNTER'.
Program terminated with signal 6, Aborted.
#0  0x00007ffdf42b7625 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install device-mapper-libs-1.02.90-2.el6.x86_64 glibc-2.12-1.149.el6.x86_64 libgcc-4.4.7-11.el6.x86_64 libselinux-2.0.94-5.8.el6.x86_64 libsepol-2.0.41-4.el6.x86_64 libstdc++-4.4.7-11.el6.x86_64 libsysfs-2.1.0-7.el6.x86_64 libudev-147-2.57.el6.x86_64 nss-softokn-freebl-3.14.3-17.el6.x86_64 sg3_utils-libs-1.28-6.el6.x86_64
(gdb) #0  0x00007ffdf42b7625 in raise () from /lib64/libc.so.6
#1  0x00007ffdf42b8e05 in abort () from /lib64/libc.so.6
#2  0x00007ffdf42f5537 in __libc_message () from /lib64/libc.so.6
#3  0x00007ffdf42fae66 in malloc_printerr () from /lib64/libc.so.6
#4  0x00007ffdf42fd9b3 in _int_free () from /lib64/libc.so.6
#5  0x00007ffdf4ff531c in sysfs_device_put (dev=0x7ffde000c500) at sysfs.c:328
#6  0x00007ffdf4fdad87 in free_path (pp=0x7ffddc0017b0) at structs.c:111
#7  0x00007ffdf4fddb6a in store_pathinfo (pathvec=0x7ffddc0057c0, hwtable=0x7ffde0000a40, devname=0x7ffddc005bbb "sdb", flag=<value optimized out>, pp_ptr=0x0) at discovery.c:59
#8  0x00007ffdf4fdde4b in path_discover (pathvec=0x7ffddc0057c0, conf=0x7ffde00008c0, flag=5) at discovery.c:92
#9  path_discovery (pathvec=0x7ffddc0057c0, conf=0x7ffde00008c0, flag=5) at discovery.c:132
#10 0x00007ffdf520e589 in mpath_persistent_reserve_in (fd=<value optimized out>, rq_servact=1, resp=0x7ffdd5769750, noisy=0, verbose=<value optimized out>) at mpath_persist.c:185
#11 0x000000000041fde6 in FTpgr::GetReservations (this=0x7ffdd576b7c0, list_p=0x7ffdd576dbe0) at /n/plfsR4001/SOURCE/R40APP.src/cc/ft/ha/com/FTpgr.C:525
#12 0x000000000040d8f5 in Access_Disk (arg=0x7ffde4025230) at /n/plfsR4001/SOURCE/SU2jR40.src/cc/ft/ha/mounter/FTdskgrp.C:2005
#13 0x00007ffdf406f9d1 in start_thread () from /lib64/libpthread.so.0
#14 0x00007ffdf436d9dd in clone () from /lib64/libc.so.6
(gdb) frame 5
#5  0x00007ffdf4ff531c in sysfs_device_put (dev=0x7ffde000c500) at sysfs.c:328
328					free(tmp);
(gdb) print *dev
 $1 = {node = {next = 0x7ffdf5208f80, prev = 0x7ffdf5208f80}, parent = 0x7ffde000cf20, devpath = "/devices/platform/host6/session5/target6:0:0/6:0:0:10/block/sde", '\000' <repeats 448 times>, subsystem = "block", '\000' <repeats 506 times>, kernel = "sde", '\000' <repeats 508 times>, kernel_number = '\000' <repeats 511 times>, driver = '\000' <repeats 511 times>}
(gdb) frame 6
#6  0x00007ffdf4fdad87 in free_path (pp=0x7ffddc0017b0) at structs.c:111
111			sysfs_device_put(pp->sysdev);
(gdb) print *pp
$2 = {dev = "sdb", '\000' <repeats 252 times>, dev_t = "8:16", '\000' <repeats 28 times>, sysdev = 0x7ffde000c500, scsi_id = {dev_id = 0, host_unique_id = 0, host_no = 0}, sg_id = {host_no = 1, channel = 0, scsi_id = 0, lun = 0, h_cmd_per_lun = 0, d_queue_depth = 0, proto_id = SCSI_PROTOCOL_UNSPEC, transport_id = 0}, wwid = '\000' <repeats 127 times>, vendor_id = "SEAGATE\000", product_id = "ST9300603SS\000\000\000\000\000", rev = "0006", serial = '\000' <repeats 63 times>, tgt_node_name = '\000' <repeats 223 times>, size = 585937500, checkint = 0, tick = 0, bus = 1, offline = 0, state = 0, dmstate = 0, chkrstate = 0, failcount = 0, priority = -1, pgindex = 0, detect_prio = 0, getuid = 0x0, prio_args = 0x0, prio = 0x0, checker = {node = {next = 0x0, prev = 0x0}, fd = 0, sync = 0, timeout = 0, disable = 0, name = '\000' <repeats 15 times>, message = '\000' <repeats 255 times>, context = 0x0, mpcontext = 0x0, check = 0, init = 0, free = 0}, mpp = 0x0, fd = -1, hwe = 0x0}
(gdb) 

I have not coded against these libraries before and I have been trying to develop a test program to emulate here.
So far I have not been able to get the code to link properly.
I need to get the Engineering development folks involved to assist with the mechanisms here and I will attempt to reproduce.

Steps I took
Create a C progran

call  mpath_lib_init()

For a list of mpath devices such as /dev/mapper/mpath[a-x]
call 
mpath_persistent_reserve_in

Looking for assistance and perhaps bmarzins to pick this up.

Comment 2 Ben Marzinski 2014-12-09 19:09:57 UTC
libmultipath isn't meant to be used outside of the device-mapper-multipath programs. It simply exists to avoid code duplication between these programs.  There ABI on this very likely WILL change between releases.  I don't think that coding to this library is a stable solution.

Comment 3 loberman 2014-12-09 19:14:54 UTC
Hello Ben
Thank You.
I will make the customer aware of this.
Laurence

Comment 29 Ben Marzinski 2015-01-24 04:01:26 UTC
Pushed this fix.

Comment 37 errata-xmlrpc 2015-07-22 07:26:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1391.html


Note You need to log in before you can comment on or make changes to this bug.