Bug 996921

Summary: multipathd: segfault at 18 ip 00007f93840371f5 sp 00007fffef217bf0 error 4 in libc-2.12.so[7f9383fc1000+18a000]
Product: Red Hat Enterprise Linux 6 Reporter: Bruno Goncalves <bgoncalv>
Component: device-mapper-multipathAssignee: Ben Marzinski <bmarzins>
Status: CLOSED ERRATA QA Contact: Bruno Goncalves <bgoncalv>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.5CC: acathrow, agk, bmarzins, dwysocha, heinzm, msnitzer, prajnoha, prockai, tlavigne, yanwang, zkabelac
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: device-mapper-multipath-0.4.9-70.el6 Doc Type: Bug Fix
Doc Text:
Regression caused by RHEL-6.4 fix. Never in released code. No documentation needed.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-21 07:51:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
abrt log file none

Description Bruno Goncalves 2013-08-14 09:39:56 UTC
Description of problem:

While booting an iSCSI system, this segfault happened.

multipathd[164]: segfault at 18 ip 00007f93840371f5 sp 00007fffef217bf0 error 4 in libc-2.12.so[7f9383fc1000+18a000]

Version-Release number of selected component (if applicable):
device-mapper-multipath-0.4.9-69.el6

How reproducible:
100%

Steps to Reproduce:
1.Install RHEL-6.5 image containing device-mapper-multipath-0.4.9-69.el6 on iSCSI boot server.
2.Boot up the server
3.Check dmesg for segfault


Additional info:
It does not happen with RHEL-6.4

Comment 2 Bruno Goncalves 2013-08-14 09:42:05 UTC
Created attachment 786472 [details]
abrt log file

Comment 3 Bruno Goncalves 2013-08-14 09:43:29 UTC
No sure if this is related, but when trying to reboot the server this error occurred:

Stopping monitoring for VG vg_storageqe01:   2 logical volume(s) in volume group "vg_storageqe01" unmonitored
[  OK  ]
Possible FCoE root detected, not stopping FCoE.
Not stopping iscsid: iscsi sessions still active[WARNING]
Root is on a multipathed device, multipathd can not be stopped
Sending all processes the TERM signal... *** glibc detected *** /sbin/multipathd: corrupted double-linked list: 0x0000000001ee9180 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x76166)[0x7f1a53d59166]
/lib64/libc.so.6(+0x765ed)[0x7f1a53d595ed]
/lib64/libc.so.6(+0x79405)[0x7f1a53d5c405]
/lib64/libc.so.6(__libc_calloc+0xc6)[0x7f1a53d5d626]
/lib64/libc.so.6(open_memstream+0x6d)[0x7f1a53d5222d]
/lib64/libc.so.6(__vsyslog_chk+0x9b)[0x7f1a53dc7bab]
/lib64/libc.so.6(__syslog_chk+0x83)[0x7f1a53dc8183]
/lib64/libmultipath.so(+0x2beed)[0x7f1a546ddeed]
/lib64/libmultipath.so(log_thread_stop+0x4a)[0x7f1a546ddf3a]
/sbin/multipathd[0x406d03]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x7f1a53d01d1d]
/sbin/multipathd[0x405139]
======= Memory map: ========
00400000-00410000 r-xp 00000000 fd:03 785098                             /sbin/multipathd
00610000-00611000 rw-p 00010000 fd:03 785098                             /sbin/multipathd
01ed9000-01efa000 rw-p 00000000 00:00 0 
01efa000-01f1b000 rw-p 00000000 00:00 0 
7f1a34000000-7f1a34021000 rw-p 00000000 00:00 0 
7f1a34021000-7f1a38000000 ---p 00000000 00:00 0 
7f1a38000000-7f1a38022000 rw-p 00000000 00:00 0 
7f1a38022000-7f1a3c000000 ---p 00000000 00:00 0 
7f1a3c000000-7f1a3c021000 rw-p 00000000 00:00 0 
7f1a3c021000-7f1a40000000 ---p 00000000 00:00 0 
7f1a40000000-7f1a40021000 rw-p 00000000 00:00 0 
7f1a40021000-7f1a44000000 ---p 00000000 00:00 0 
7f1a44000000-7f1a44021000 rw-p 00000000 00:00 0 
7f1a44021000-7f1a48000000 ---p 00000000 00:00 0 
7f1a48000000-7f1a48021000 rw-p 00000000 00:00 0 
7f1a48021000-7f1a4c000000 ---p 00000000 00:00 0 
7f1a4c000000-7f1a4c021000 rw-p 00000000 00:00 0 
7f1a4c021000-7f1a50000000 ---p 00000000 00:00 0 
7f1a52a4f000-7f1a52a51000 r-xp 00000000 fd:03 2485880                    /lib64/multipath/libprioontap.so
7f1a52a51000-7f1a52c50000 ---p 00002000 fd:03 2485880                    /lib64/multipath/libprioontap.so
7f1a52c50000-7f1a52c51000 rw-p 00001000 fd:03 2485880                    /lib64/multipath/libprioontap.so
7f1a52c51000-7f1a52c54000 r-xp 00000000 fd:03 2485874                    /lib64/multipath/libchecktur.so
7f1a52c54000-7f1a52e53000 ---p 00003000 fd:03 2485874                    /lib64/multipath/libchecktur.so
7f1a52e53000-7f1a52e54000 rw-p 00002000 fd:03 2485874                    /lib64/multipath/libchecktur.so
7f1a52e54000-7f1a52e55000 r-xp 00000000 fd:03 2485876                    /lib64/multipath/libprioconst.so
7f1a52e55000-7f1a53054000 ---p 00001000 fd:03 2485876                    /lib64/multipath/libprioconst.so
7f1a53054000-7f1a53055000 rw-p 00000000 fd:03 2485876                    /lib64/multipath/libprioconst.so
7f1a53055000-7f1a53056000 r-xp 00000000 fd:03 2485714                    /lib64/libaio.so.1.0.1
7f1a53056000-7f1a53255000 ---p 00001000 fd:03 2485714                    /lib64/libaio.so.1.0.1
7f1a53255000-7f1a53256000 rw-p 00000000 fd:03 2485714                    /lib64/libaio.so.1.0.1
7f1a53256000-7f1a53258000 r-xp 00000000 fd:03 2485868                    /lib64/multipath/libcheckdirectio.so
7f1a53258000-7f1a53457000 ---p 00002000 fd:03 2485868                    /lib64/multipath/libcheckdirectio.so
7f1a53457000-7f1a53458000 rw-p 00001000 fd:03 2485868                    /lib64/multipath/libcheckdirectio.so
7f1a53458000-7f1a53475000 r-xp 00000000 fd:03 2485560                    /lib64/libtinfo.so.5.7
7f1a53475000-7f1a53675000 ---p 0001d000 fd:03 2485560                    /lib64/libtinfo.so.5.7
7f1a53675000-7f1a53679000 rw-p 0001d000 fd:03 2485560                    /lib64/libtinfo.so.5.7
7f1a53679000-7f1a53685000 r-xp 00000000 fd:03 2485631                    /lib64/libudev.so.0.5.1
7f1a53685000-7f1a53885000 ---p 0000c000 fd:03 2485631                    /lib64/libudev.so.0.5.1
7f1a53885000-7f1a53886000 r--p 0000c000 fd:03 2485631                    /lib64/libudev.so.0.5.1
7f1a53886000-7f1a53887000 rw-p 0000d000 fd:03 2485631                    /lib64/libudev.so.0.5.1
7f1a53887000-7f1a538c2000 r-xp 00000000 fd:03 2485589                    /lib64/libsepol.so.1
7f1a538c2000-7f1a53ac2000 ---p 0003b000 fd:03 2485589                    /lib64/libsepol.so.1
7f1a53ac2000-7f1a53ac3000 r--p 0003b000 fd:03 2485589                    /lib64/libsepol.so.1
7f1a53ac3000-7f1a53ac4000 rw-p 0003c000 fd:03 2485589                    /lib64/libsepol.so.1
7f1a53ac4000-7f1a53ae1000 r-xp 00000000 fd:03 2485590                    /lib64/libselinux.so.1
7f1a53ae1000-7f1a53ce0000 ---p 0001d000 fd:03 2485590                    /lib64/libselinux.so.1
7f1a53ce0000-7f1a53ce1000 r--p 0001c000 fd:03 2485590                    /lib64/libselinux.so.1
7f1a53ce1000-7f1a53ce2000 rw-p 0001d000 fd:03 2485590                    /lib64/libselinux.so.1
7f1a53ce2000-7f1a53ce3000 rw-p 00000000 00:00 0 
7f1a53ce3000-7f1a53e6d000 r-xp 00000000 fd:03 2485518                    /lib64/libc-2.12.so
7[f1a53e6d000-7f1a5406d000 ---p 0018a000 fd:03 2485518                    /lib64/libc-2.12.so
7f1a5406d000-7f1a54071000 r--p 0018a000 fd:03 2485518                    /lib64/libc-2.12.so
7f1a54071000-7f1a54072000 rw-p 0018e000 fd:03 2485518                    /lib64/libc-2.12.so
7f1a54072000-7f1a54077000 rw-p 00000000 00:00 0 
7f1a54077000-7f1a5408e000 r-xp 00000000 fd:03 2485542                    /lib64/libpthread-2.12.so
7f1a5408e000-7f1a5428e000 ---p 00017000 fd:03 2485542                    /lib64/libpthread-2.12.so
7f1a5428e000-7f1a5428f000 r--p 00017000 fd:03 2485542                    /lib64/libpthread-2.12.so
7f1a5428f000-7f1a54290000 rw-p 00018000 fd:03 2485542                    /lib64/libpthread-2.12.so
7f1a54290000-7f1a54294000 rw-p 00000000 00:00 0 
7f1a54294000-7f1a542aa000 r-xp 00000000 fd:03 2485506                    /lib64/libgcc_s-4.4.7-20120601.so.1
7f1a542aa000-7f1a544a9000 ---p 00016000 fd:03 2485506                    /lib64/libgcc_s-4.4.7-20120601.so.1
7f1a544a9000-7f1a544aa000 rw-p 000fcoemon: error 111 Connection refused
15000 fd:03 2485506                    /lib64/libgcc_s-4.4.7-20120601.so.1
7f1a544aa000-7f1a544b2000 r-xp 00000000 fd:03 2485864                    /lib64/libmpathpersist.so.0
7f1a544b2000-7f1a546b1000 ---p 00008000 fd:03 2485864                    /lib64/libmpathpersist.so.0
7f1a546b1000-7f1a546b2000 rw-p 00007000 fd:03 2485864                    /lib64/libmpathpersist.so.0
7f1a546b2000-7f1a546f0000 r-xp 00000000 fd:03 2485865                    /lib64/libmultipath.so
7f1a546f0000-7f1a548f0000 ---p 0003e000 fd:03 2485865                    /lib64/libmultipath.so
7f1a548f0000-7f1a548f4000 rw-p 0003e000 fd:03 2485865                    /lib64/libmultipath.so
7f1a548f4000-7f1a548f6000 rw-p 00000000 00:00 0 
7f1a548f6000-7f1a548f8000 r-xp 00000000 fd:03 2485524                    /lib64/libdl-2.12.so
7f1a548f8000-7f1a54af8000 ---p 00002000 fd:03 2485524                    /lib64/libdl-2.12.so
7f1a54af8000-7f1a54af9000 r--p 00002000 fd:03 2485524                    /lib64/libdl-2.12.so
7f1a54af9000-7f1a54afa000 r  OK  
]w-p 00003000 fd:03 2485524                    /lib64/libdl-2.12.so
7f1a54afa000-7f1a54b1c000 r-xp 00000000 fd:03 2485556                    /lib64/libncurses.so.5.7
7f1a54b1c000-7f1a54d1b000 ---p 00022000 fd:03 2485556                    /lib64/libncurses.so.5.7
7f1a54d1b000-7f1a54d1c000 rw-p 00021000 fd:03 2485556                    /lib64/libncurses.so.5.7
7f1a54d1c000-7f1a54d56000 r-xp 00000000 fd:03 2485588                    /lib64/libreadline.so.6.0
7f1a54d56000-7f1a54f56000 ---p 0003a000 fd:03 2485588                    /lib64/libreadline.so.6.0
7f1a54f56000-7f1a54f5e000 rw-p 0003a000 fd:03 2485588                    /lib64/libreadline.so.6.0
7f1a54f5e000-7f1a54f5f000 rw-p 00000000 00:00 0 
7f1a54f5f000-7f1a54f95000 r-xp 00000000 fd:03 2485839                    /lib64/libdevmapper.so.1.02
7f1a54f95000-7f1a55195000 ---p 00036000 fd:03 2485839                    /lib64/libdevmapper.so.1.02
7f1a55195000-7f1a55198000 rw-p 00036000 fd:03 2485839                    /lib64/libdevmapper.so.1.02
fcoemon: Failed write req D len 1

f1a55199000 rw-p 00000000 00:00 0 
7f1a55199000-7f1a551b9000 r-xp 00000000 fd:03 2485511                    /lib64/ld-2.12.so
7f1a55336000-7f1a55337000 ---p 00000000 00:00 0 
7f1a55337000-7f1a5533e000 rw-p 00000000 00:00 0 
7f1a5533e000-7f1a5533f000 ---p 00000000 00:00 0 
7f1a5533f000-7f1a55346000 rw-p 00000000 00:00 0 
7f1a55346000-7f1a55347000 ---p 00000000 00:00 0 
7f1a55347000-7f1a55357000 rw-p 00000000 00:00 0 
7f1a55357000-7f1a55358000 ---p 00000000 00:00 0 
7f1a55358000-7f1a55368000 rw-p 00000000 00:00 0 
7f1a55368000-7f1a55369000 ---p 00000000 00:00 0 
7f1a55369000-7f1a55370000 rw-p 00000000 00:00 0 
7f1a55370000-7f1a55371000 ---p 00000000 00:00 0 
7f1a55371000-7f1a55381000 rw-p 00000000 00:00 0 
7f1a55381000-7f1a55382000 ---p 00000000 00:00 0 
7f1a55382000-7f1a55392000 rw-p 00000000 00:00 0 
7f1a55392000-7f1a55393000 ---p 00000000 00:00 0 
7f1a55393000-7f1a553a3000 rw-p 00000000 00:00 0 
7f1a553a3000-7f1a553aa000 rw-p 00000000 00:00 0 
7f1a553ae000-7f1a553af000 rw-p 00000000 00:00 0 
7f1a553af000-7f1
a553b0000 ---p 00000000 00:00 0 
7f1a553b0000-7f1a553b7000 rw-p 00000000 00:00 0 
7f1a553b7000-7f1a553b8000 rw-p 00000000 00:00 0 
7f1a553b8000-7f1a553b9000 r--p 0001f000 fd:03 2485511                    /lib64/ld-2.12.so
7f1a553b9000-7f1a553ba000 rw-p 00020000 fd:03 2485511                    /lib64/ld-2.12.so
7f1a553ba000-7f1a553bb000 rw-p 00000000 00:00 0 
7fff75e69000-7fff75e7e000 rw-p 00000000 00:00 0                          [stack]
7fff75f9c000-7fff75f9d000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

Comment 5 Ben Marzinski 2013-08-15 17:38:28 UTC
Comment #3 definitely looks related.  The backtrace for this crash is:

#0  0x00007efff52951f5 in malloc_consolidate () from /lib64/libc.so.6
#1  0x00007efff5298405 in _int_malloc () from /lib64/libc.so.6
#2  0x00007efff5299626 in calloc () from /lib64/libc.so.6
#3  0x00007efff528e22d in open_memstream () from /lib64/libc.so.6
#4  0x00007efff5303bab in __vsyslog_chk () from /lib64/libc.so.6
#5  0x00007efff5304183 in __syslog_chk () from /lib64/libc.so.6
#6  0x00007efff5c19eed in flush_logqueue () at log_pthread.c:43
#7  0x00007efff5c19f3a in log_thread_stop () at log_pthread.c:92
#8  0x0000000000406d03 in child (argc=<value optimized out>,
    argv=<value optimized out>) at main.c:1693
#9  main (argc=<value optimized out>, argv=<value optimized out>)
    at main.c:1854


The good news appears to be that it's crashing while multipathd is stopping.  I assume this is happening when you pivot from the initramfs to the actual root, and stop the initramfs multipathd.  This should be O.k. multipathd was stopping anyway and will be restarted by normally by the system later on in the boot process.  Is this what you see?

I'm still looking into the cause of the crash.  If I could get on a system the recreates it easily that would speed things up.

Comment 9 Ben Marzinski 2013-08-19 23:30:33 UTC
Up through 6.4 multipath's per device waiter threads weren't getting stopped when they were supposed to.  On reconfigure, for example, the old waiter threads would stick around until a dm event was issued for the device. I fixed this for RHEL 6.5.  Unfortunately, the shutdown code wasn't properly waiting for these threads before deallocating memory that they were using.  When the threads were just stuck waiting on a dm event, this didn't cause problems, but now that the threads were trying to stop at the same time the rest of the multipath code was,
this bug started appearing.  The main multipath thread no longer tries to free this memory while the waiter threads are shutting down.

Comment 11 Bruno Goncalves 2013-08-20 15:19:27 UTC
Verified fix on 
rpm -q device-mapper-multipath
device-mapper-multipath-0.4.9-70.el6.x86_64

Comment 12 Ben Marzinski 2013-08-29 21:04:22 UTC
*** Bug 1001455 has been marked as a duplicate of this bug. ***

Comment 14 errata-xmlrpc 2013-11-21 07:51:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1574.html