Bug 895110
| Summary: | multipathd crash when having many interfaces | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Bruno Goncalves <bgoncalv> | ||||||
| Component: | device-mapper-multipath | Assignee: | Ben Marzinski <bmarzins> | ||||||
| Status: | CLOSED DUPLICATE | QA Contact: | Bruno Goncalves <bgoncalv> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 6.4 | CC: | agk, bmarzins, dwysocha, hateya, heinzm, msnitzer, prajnoha, prockai, rbalakri, zkabelac | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2015-10-14 16:13:04 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Created attachment 678263 [details]
multipath_malloc error
This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux. How many LUNs are on the server? I'm not able to recreate this with the identical packages, using 100 interfaces to 10 LUNs I was able to reproduce it using tgtd with 1 target with 1 LUN.
cat /etc/tgt/targets.conf
default-driver iscsi
<target iqn.2009-10.com.redhat:storage-1>
write-cache off
allow-in-use yes
<backing-store /var/lib/tgtd/loop-disk-1-1>
scsi_sn 1
scsi_id 1
lun 1
</backing-store>
</target>
-------------
for i in {1..100}; do
iscsiadm -m iface -o new -I multi_iqn_$i;
iscsiadm -m iface -I multi_iqn_$i -o update -n iface.initiatorname -v iqn.1994-05.com.redhat:multi-iqn-1;
iscsiadm -m discovery -t st -p 127.0.0.1 -I multi_iqn_$i -o new;
done
# multipath -l
#
# iscsiadm -m node -l
# multipath -l | grep sd | wc
38 303 1824
#
NOTE: I could not see the error being logged in any file, only on the console. It also showed:
mp->params too small
mp->params too small
mp->params too small
Hrm.. That was the first thing I tried. I didn't have the targets on the same node, however. I'll try that. Could you also post your /etc/multipath.conf. Created attachment 691422 [details]
multipath.conf
I don't have much hope for this, but can you try the packages at: http://download.devel.redhat.com/brewroot/scratch/bmarzins/task_5351856/ Those will fix the "mp->params too small" messages. rpm -q device-mapper-multipath device-mapper-multipath-0.4.9-64.el6.bz895110.x86_64 With this build the is no error messages, but it seems it still can only handle 51 paths. iscsiadm -m session -P3 | grep sd | wc 100 600 4274 multipath -l | grep sd | wc 51 408 2448 There are some error messages when login out with iscsiadm -m node -u device-mapper: table: 253:2: multipath: error getting device device-mapper: table: 253:2: multipath: error getting device So, did multipathd crash? Would it be possible for me to get on this machine to take a look at it myself? Actually, with just some debugging info added, everything worked just fine, with one small issue: "multipath -l" doesn't correctly display a multipath device that big. # multipath -l | grep sd | wc -l 55 similar to your Comment 8 result. However looking at the output # multipath -l ... |-+- policy='round-robin 0' prio=0 status=enabled | `- 59:0:0:1 sdau 66:224 active undef running |-+- policy='round-robin 0' prio=0 status=enabled | `- 80:0:0:1 sdbh 67:176 active undef running |-+- policy='round-robin 0' prio=0 s It's pretty clearly cut off mid-line Using multipathd to look at the device # multipathd show topology map mpathc | grep sd | wc -l 100 does show all hundred paths, and look at /var/log/messages Feb 7 11:21:08 storageqe-12 multipathd: mpathc: sdcp - directio checker reports path is up Feb 7 11:21:08 storageqe-12 multipathd: 69:208: reinstated Feb 7 11:21:08 storageqe-12 multipathd: mpathc: remaining active paths: 100 so the device does have 100 active paths. I'll fix the multipath -l size issue. The original corruption issue is more worrying. I did expand a buffer that wasn't big enough, but I don't see where multipathd writes to it without size limiting, so unless I'm missing something, there still should be a overwrite, or a write after free, or something. However, I've been totally unable to reproduce it, even with running valgrind with all the memory zeroing options. That doesn't mean it's not there anymore, it just means the only way to find it is to notice it while reading the code. If you can reproduce the corruption, please let me know. thanks, if I reproduce the crash I'll let you know, and when login out of the sessions did you reproduce this error? iscsiadm: initiator reported error (9 - internal error) iscsiadm: Could not logout of [sid: 119, target: iqn.2009-10.com.redhat:storage- 1, portal: 127.0.0.1,3260]. iscsiadm: initiator reported error (9 - internal error) iscsiadm: Could not logout of [sid: 118, target: iqn.2009-10.com.redhat:storage- 1, portal: 127.0.0.1,3260]. iscsiadm: initiator reported error (9 - internal error) iscsiadm: Could not logout of [sid: 117, target: iqn.2009-10.com.redhat:storage- 1, portal: 127.0.0.1,3260]. *** Bug 723169 has been marked as a duplicate of this bug. *** This should be fixed by the fix for 880121 *** This bug has been marked as a duplicate of bug 880121 *** |
Description of problem: Running the system with 100 iSCSI interfaces caused: *** glibc detected *** /sbin/multipathd: malloc(): smallbin double linked list corrupted: 0x00007ff5046861a0 *** Version-Release number of selected component (if applicable): rpm -q device-mapper-multipath device-mapper-multipath-0.4.9-63.el6.x86_64 rpm -q device-mapper device-mapper-1.02.77-7.el6.x86_64 uname -r 2.6.32-353.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1.Create 100 interfaces and discovery target for i in {1..100}; do iscsiadm -m iface -o new -I multi_iqn_$i; iscsiadm -m iface -I multi_iqn_$i -o update -n iface.initiatorname -v iqn.1994-05.com.redhat:multi-iqn-1; iscsiadm -m discovery -t st -p <portal> -I multi_iqn_$i -o new; done 2.login then to target iscsiadm -m node -l 3. check the messages on console Actual results: multipathd fails Additional info: sd 1085:0:0:0: [sdge] Attached SCSI disk sd 1093:0:0:0: [sdgh] Attached SCSI disk sd 1108:0:0:0: [sdgs] Attached SCSI disk *** glibc detected *** /sbin/multipathd: malloc(): smallbin double linked list corrupted: 0x00007ff5046861a0 *** ======= Backtrace: ========= /lib64/libc.so.6(+0x760e6)[0x7ff522d910e6] /lib64/libc.so.6(+0x79e9f)[0x7ff522d94e9f] /lib64/libc.so.6(__libc_malloc+0x71)[0x7ff522d95911] /lib64/libc.so.6(__strdup+0x22)[0x7ff522d9c042] /lib64/libdevmapper.so.1.02(+0xca93)[0x7ff523fa0a93] /lib64/libmultipath.so(dm_type+0x47)[0x7ff5236f8d67] /lib64/libmultipath.so(dm_get_maps+0xc9)[0x7ff5236f9199] /lib64/libmultipath.so(dm_get_name+0x32)[0x7ff5236f9292] /lib64/libmultipath.so(__setup_multipath+0xb8)[0x7ff5237167c8] /lib64/libmultipath.so(add_map_without_path+0x3d)[0x7ff5237171cd] /sbin/multipathd[0x408470] /sbin/multipathd(uev_trigger+0x262)[0x408af2] /lib64/libmultipath.so(service_uevq+0x64)[0x7ff52370f214] /lib64/libmultipath.so(+0x262b7)[0x7ff52370f2b7] /lib64/libpthread.so.0(+0x7851)[0x7ff5230b5851] /lib64/libc.so.6(clone+0x6d)[0x7ff522e0390d]