Bug 1323429

Summary: multipathd crashed during iSCSI login logout tests
Product: Red Hat Enterprise Linux 7 Reporter: shivamerla1 <shiva.krishna>
Component: device-mapper-multipathAssignee: Ben Marzinski <bmarzins>
Status: CLOSED ERRATA QA Contact: Zhang Yi <yizhan>
Severity: high Docs Contact: Steven J. Levine <slevine>
Priority: unspecified    
Version: 7.1CC: agk, bgoncalv, bmarzins, heinzm, lilin, msnitzer, prajnoha, shiva.krishna, yizhan
Target Milestone: rcKeywords: OtherQA
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Multipath no longer crashes when a uid_attribute is changed after a device is added and the device is then removed Previously, if a path changed its WWID after being added to a multipath device, the `multipathd` daemon would create a new device. This led to the path being in both devices. As a consequence, if users changed the `uid_attribute` after multipath devices were created and then removed the devices, `multipathd` would try to access freed memory and crash. With this fix, `multipathd` no longer allows the path's WWID to be changed while it is being used in a multipath device. As a result, `multipathd` no longer crashes in this scenario.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-04 08:19:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
core files and sosreport
none
rhel6.7 crash.
none
patch to keep multipath from creating two devices with the same path. none

Description shivamerla1 2016-04-02 21:55:37 UTC
Created attachment 1142883 [details]
core files and sosreport

Description of problem:
Multipathd daemon crashed during iSCSI login/logout tests.

Version-Release number of selected component (if applicable):
[root@hitdev-shiva-rhel71 abrt]# rpm -qa | grep multipath
device-mapper-multipath-libs-0.4.9-77.el7.x86_64
device-mapper-multipath-debuginfo-0.4.9-77.el7.x86_64
device-mapper-multipath-0.4.9-77.el7.x86_64

[root@hitdev-shiva-rhel71 abrt]# uname -r
3.10.0-229.el7.x86_64

How reproducible:
Rarely.

Steps to Reproduce:
1. Map around ~100 iSCSI based volumes to host
2. perform iscsiadm login/logout tests.
3. multipathd crash observed.

Actual results:
multipathd should be up and running.

Expected results:
core dumped.

Additional info:


(gdb) bt
#0  0x00007f78746c55d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f78746c6cc8 in __GI_abort () at abort.c:90
#2  0x00007f7874705e07 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7f787480e8c8 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
#3  0x00007f787470d1fd in malloc_printerr (ptr=<optimized out>, str=0x7f787480bfd4 "free(): invalid pointer", action=3) at malloc.c:4972
#4  _int_free (av=0x7f7874a4a760 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:3804
#5  0x00007f7874ea3a61 in verify_paths (mpp=mpp@entry=0x7f786400c830, vecs=vecs@entry=0x7f78763a52b0, rpvec=rpvec@entry=0x0) at structs_vec.c:517
#6  0x00007f7875b96467 in ev_add_path (pp=0x7f785c4d7300, vecs=vecs@entry=0x7f78763a52b0) at main.c:491
#7  0x00007f7875b96cac in uev_add_path (uev=0x7f786c111a10, uev=0x7f786c111a10, vecs=0x7f78763a52b0) at main.c:443
#8  uev_trigger (uev=0x7f786c111a10, trigger_data=0x7f78763a52b0) at main.c:834
#9  0x00007f7874e9a2a2 in service_uevq (tmpq=tmpq@entry=0x7f7875b0ddc0) at uevent.c:118
#10 0x00007f7874e9a390 in uevent_dispatch (uev_trigger=uev_trigger@entry=0x7f7875b96770 <uev_trigger>, trigger_data=<optimized out>) at uevent.c:178
#11 0x00007f7875b94f53 in uevqloop (ap=<optimized out>) at main.c:863
#12 0x00007f7875756df5 in start_thread (arg=0x7f7875b0e700) at pthread_create.c:308
#13 0x00007f78747861ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Comment 2 Ben Marzinski 2016-04-04 22:51:21 UTC
Well, something is certainly messed up. Looking at the path that is failing to get deleted:

(gdb) print *pp
$20 = {
  dev = "\260Lzdx\177\000\000\300\232\025dx\177", '\000' <repeats 241 times>,
  dev_t = "134:544", '\000' <repeats 25 times>, udev = 0x0, sg_id = {
    host_no = 1213, channel = 0, scsi_id = 0, lun = 0, h_cmd_per_lun = 0,
    d_queue_depth = 0, proto_id = SCSI_PROTOCOL_UNSPEC, transport_id = 0},
  geom = {heads = 34 '"', sectors = 61 '=', cylinders = 1011, start = 0},
  wwid = "25dc34acf3048c51a6c9ce9004d9d6fdc_2", '\000' <repeats 92 times>,
  vendor_id = "Nimble  ", product_id = "Server          ", rev = "1.0 ",
  serial = "5dc34acf3048c51a6c9ce9004d9d6fdc", '\000' <repeats 32 times>,
  tgt_node_name = '\000' <repeats 223 times>, size = 2097152, checkint = 5,
  tick = 4, bus = 1, offline = 0, state = 2, dmstate = 1, chkrstate = 2,
  failcount = 2, priority = 50, pgindex = 1, detect_prio = 1,
  uid_attribute = 0x7f7864027b00 "NIMBLE_SERIAL_MAGID", prio = {handle = 0x0,
    refcount = 0, node = {next = 0x0, prev = 0x0},
    name = '\000' <repeats 15 times>, args = '\000' <repeats 254 times>,
    getprio = 0x0}, prio_args = 0x0, checker = {node = {next = 0x0,
      prev = 0x0}, handle = 0x0, refcount = 0, fd = 0, sync = 0, timeout = 0,
    disable = 0, name = '\000' <repeats 15 times>,
    message = '\000' <repeats 255 times>, wwid = '\000' <repeats 127 times>,
    context = 0x0, mpcontext = 0x0, check = 0x0, init = 0x0, free = 0x0},
  mpp = 0x7f786403a2c0, fd = 14, hwe = 0x7f786400f130}

The first attribute, dev is garbage (dev = "\260Lzdx\177\000\000\300\232\025dx\177"). It should say something like "sda". So it looks like its possible that this structure has already been freed, and partially overwritten, which would explain the failed free.

Aside from the beginning of the structure, the rest looks uncorrupted, but there are some oddities in it. First, multipath device that it points to (mpp = 0x7f786403a2c0), isn't the one that is being verified. It also doesn't have the same wwid as the path. The path's wwid is "25dc34acf3048c51a6c9ce9004d9d6fdc_2". The wwid of the device at 0x7f786403a2c0 is "25dc34acf3048c51a6c9ce9004d9d6fdc".  Looking at the wwid that verify_paths was called on, 0x7f786400c830, it does have the same wwid as the path, "25dc34acf3048c51a6c9ce9004d9d6fdc_2", and clearly, the path was in its list of paths. So for some reason the path seems to have been pointed at the wrong
multipath device.

Now I realize that you can't trust the values in a freed structure, but these values don't look corrupted, just wrong. Even more interesting, "25dc34acf3048c51a6c9ce9004d9d6fdc" is only different from "25dc34acf3048c51a6c9ce9004d9d6fdc_2" by the "_2". I also see that this device is using a different uid_attribute (NIMBLE_SERIAL_MAGID) from the standard ID_SERIAL. There is no chance that the value of NIMBLE_SERIAL_MAGID can change for a device, correct? If that value isn't persistent, that could be the cause of the problem.  Even if it is, I wonder if something is not properly dealing with the different uid_attribute. But unless a very unlikely coincidence made the freed path appear to be pointing to the wrong multipath device, that mismatch between the device the path thinks it belongs to and the device that actually owns it explains the double free.

I'll keep looking to see if I can figure out how those two could get mismatched. But please let me know if NIMBLE_SERIAL_MAGID is not persistent. Also, if you could verify that this is still reproducible on a more recent release, that would be nice, but not necessary, since I don't know of any changes that are likely to have fixed this.

Comment 3 shivamerla1 2016-04-04 23:54:05 UTC
Ben, 

NIMBLE_SERIAL_MAGID udev attribute is populated during path discovery.

KERNEL=="sd*[!0-9]", ACTION=="add", ENV{ID_SERIAL}=="*6c9ce9*", PROGRAM="/opt/NimbleStorage/bin/lower_tier_dm.sh /dev/%k $env{ID_SERIAL}", ENV{NIMBLE_SERIAL_MAGID}="%c"

All this does is send an inquiry to the array on the path and append array ID( in this case 2) to UUID.( hence UUID_2). But, once this is assigned on discovery, it should not change on "change" events.

Comment 4 Ben Marzinski 2016-04-05 22:38:21 UTC
So, digging into the messages more, it's pretty obvious that the scsi devices think they belong to mpathlv but actually belong to mpathfd. At some point in time, I believe all the paths get deleted, causing mpathlv to try to flush and delete itself, but it is in-use.  After that, the only think holding a reference to the paths is mpathfd, which causes multipathd to crash.  The issue with figuring out exactly what is going on is complicated by journald dropping a large number of multipathd messages.

To fix that, could you please set

RateLimitBurst=0

in /etc/systemd/journald.conf, and then restart the journald with

# systemctl restart systemd-journald

and recreate this issue. Then send me the core files and sosreport just like before.

Also, when I look at the var/log/messages and "sos_commands/logs/journalctl_--all_--this-boot_--no-pager" in your sosreport, it doesn't go back all the way to boot time, and probably cuts off a lot of useful information from how things got set up. I'm not sure why this is, but hopefully it won't happen if you reproduce this.

Also, do you need over 100 devices to reproduce this? Or can it be reproduced with a significantly smaller number of devices?  The fewer devices there are, the less information there is to sift through. But if you can't, that's fine.

Comment 5 Ben Marzinski 2016-04-06 22:31:25 UTC
I still not sure what's happening here. There seems to be only one of two options. Either the paths are getting set to point at a multipath device that doesn't match their wwid, or the wwid of the multipath device is getting changed.  The only way the first one could happen would be if a mutipath device was created outside of multipathd (say, by running multipath) and do to some misconfiguration, it had the wrong devices. I'm not sure that the second one can happen at all, but it might be possible if some of the devices were created in the initramfs, and some of them were created later, with different configurations.

Do you ever run the multipath command during your tests?
Do you know if some of the devices are getting created by the initramfs? If so, could you remake your initramfs. This will pull in your current multipath configuration, so that they will match.

There have been some fixes to deal with issues related to different configurations in the initramfs and regular filesystem.  Also, there are some places where more recent version of the package would print out more useful logging information. So, if it would be possible to test on a more recent release, that would be helpful.

The other possibility is for me to make some test packages that will hopefully log messages when things get messed up, so we can figure out where that is. Are you able to run debug packages in your tests?

I have been completely unable to reproduce this, but then again, I don't have the same storage as you do.

Comment 6 shivamerla1 2016-04-07 03:10:45 UTC
No, we don't use multipath command to discover paths. But we have figured that mpath devices with both UUID and UUID_ArrayID ( 2 in this case ) are present in some cases. I.e by default multipathd uses ID_SERIAL and later our application changes this to UUID_ARRAYID. In those cases when we do multipathd reconfigure, we are seeing both maps present. May be that would have caused this.

We append ARRAY_ID to ID_SERIAL as we are trying to create a scale-out configuration using dm-switch on top of dm-multipath. Volume(LUN) is spawning across multiple arrays.

dm-switch
  +
  +-- dm-multipth ( UUID_1)
  |
  +-- dm-multipath (UUID_2)


I will try to recreate this with less volumes and upload the dump again.

Comment 7 Lin Li 2016-04-11 03:00:14 UTC
hello shivamerla1,
Because there is no Nimble stoarge in our lab, could you help to provide test result once fixed patch is available?
thanks!

Comment 8 shivamerla1 2016-04-13 00:57:04 UTC
Yes, we can verify the fix.

Comment 10 shivamerla1 2016-04-21 20:56:29 UTC
Tried this on RHEL 6.7 and hit multipathd crash again. core dump attached. Following are the steps.

Initial multipath configuration with LVM and FS usage and mounted.

-bash-4.1# multipath -ll
mpathep (214891a670db78dfc6c9ce900b21942a0) dm-4 Nimble,Server
size=100G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=50 status=active
  |- 5:0:0:0 sdd 8:48 active ready running
  `- 6:0:0:0 sde 8:64 active ready running
mpatheo (2cbc7cc0c125b94db6c9ce900b21942a0) dm-3 Nimble,Server
size=100G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=50 status=active
  `- 3:0:0:0 sdb 8:16 active ready running
mpathen (2fadc0640674f6b6a6c9ce900b21942a0) dm-2 Nimble,Server
size=100G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=50 status=active
  `- 4:0:0:0 sdc 8:32 active ready running


After appending _1 to each multipath device and doing multipathd reconfigure. ( by UID_ATTRIBUTE parameter).

-bash-4.1# multipath -ll
mpather (2cbc7cc0c125b94db6c9ce900b21942a0_1) dm-9 Nimble,Server
size=100G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=50 status=active
  `- 3:0:0:0 sdb 8:16 active ready running
mpatheq (2fadc0640674f6b6a6c9ce900b21942a0_1) dm-8 Nimble,Server
size=100G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=50 status=active
  |- 4:0:0:0 sdc 8:32 active ready running
  `- 7:0:0:0 sdf 8:80 active ready running
mpathep (214891a670db78dfc6c9ce900b21942a0) dm-4 Nimble,Server
size=100G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=50 status=active
  |- 5:0:0:0 sdd 8:48 active ready running
  `- 6:0:0:0 sde 8:64 active ready running
mpatheo (2cbc7cc0c125b94db6c9ce900b21942a0) dm-3 Nimble,Server
size=100G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=50 status=active
  `- 3:0:0:0 sdb 8:16 active ready running
mpathen (2fadc0640674f6b6a6c9ce900b21942a0) dm-2 Nimble,Server
size=100G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=50 status=active
  `- 4:0:0:0 sdc 8:32 active ready running
mpathes (214891a670db78dfc6c9ce900b21942a0_1) dm-10 Nimble,Server
size=100G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=50 status=active
  |- 5:0:0:0 sdd 8:48 active ready running
  `- 6:0:0:0 sde 8:64 active ready running



Now when we do "udevadm trigger" this is causing multipathd to crash.

Comment 11 shivamerla1 2016-04-21 20:57:08 UTC
Created attachment 1149584 [details]
rhel6.7 crash.

Comment 12 Ben Marzinski 2016-06-03 21:10:20 UTC
So, what you do in Comment 10 is completely unsafe to do with the current multipath tools.  multipathd assumes that each path is only related to one multipath device. Changing this assumption is not simple. I can fix multipathd to avoid this crash, but I'd be doing so by preventing you from creating two multipath devices with the same paths.

I realize the dm-switch documentation talks about a setup like you describe, but currently the multipath tools don't support this, and it's not likely that they will in the RHEL7 timeframe. It is possible to create this setup outside of multipathd, but then you have no path restoration and failback.

Comment 13 shivamerla1 2016-06-03 21:29:44 UTC
Ben, 

This is happening only when dm-multipath devices are already in use and multipathd reconfigure is not able to remove old maps. In these cases, we have put a limitation for users to reboot hosts after changing the getuid_callout to vendor specific to append ARRAYID to serial. With this multipath maps are created right for each array in our group and works well with dm-switch mapping. After this we don't have same paths belonging to multiple maps. Do you still see any caveats after this?.

Comment 14 Ben Marzinski 2016-06-06 20:23:52 UTC
So, if I fix this in a way that makes multipathd unable to create a new map with paths that are already in use in another map with a different WWID, you are fine with that? If so, then sure, I can fix this.

Comment 15 shivamerla1 2016-06-06 21:22:06 UTC
That breaks the getuid_callout functionality right?. The way its is today, users have flexibility to change getuid_callout anytime and reconfigure. Ideally instead of creating new dm-multipath device, if paths are already configured under certain dm-multipath device, can existing dm-multipath device parameters(dm/uuid, /dev/disk/by-id links etc) be updated with new values?.

Comment 16 Ben Marzinski 2016-06-06 21:38:18 UTC
(In reply to shivamerla1 from comment #15)
> That breaks the getuid_callout functionality right?. The way its is today,
> users have flexibility to change getuid_callout anytime and reconfigure.
This is true, and users will still have that option, but there are cases already in multipath where some configuration values will not get updated on existing devices. The devices must first be deleted and recreated. This would just be another case like that. You are completely able to do the reconfiguration, but until you remove the devices using the paths with their own WWID, you won't be able to create new devices with the new WWID.


> Ideally instead of creating new dm-multipath device, if paths are already
> configured under certain dm-multipath device, can existing dm-multipath
> device parameters(dm/uuid, /dev/disk/by-id links etc) be updated with new
> values?.

This isn't possible. Device-mapper itself specifically disallows changing the UUID, because it would break all sorts of things.

Comment 17 shivamerla1 2016-06-06 21:59:37 UTC
Ok got it. I am not sure we need this change to prevent the new device creation. that might break our application. We made a limitation of system reboot is necessary to support this kind of multipath solution. Can i get a private patch to verify the impact of this patch on our application?.

Comment 18 Ben Marzinski 2016-06-07 18:56:41 UTC
Created attachment 1165736 [details]
patch to keep multipath from creating two devices with the same path.

This patch just verifies that when a path is part of a multipath device read in from the kernel, it's wwid matches the wwid of the multipath device. If it doesn't the path's wwid is changed to match the multipath wwid, so no new device is created.

Comment 19 shivamerla1 2016-06-07 20:03:06 UTC
Thanks, will give this a try and verify our application behavior.

Comment 20 shivamerla1 2016-06-13 20:01:15 UTC
I have verified the patch and it works good with our application. We are able to handle this scenario and continue to use the old map. But path removal and addition ( remove path and add path event handling ), will create new maps with updated UID. Since reboot is always suggested with this, the fix seems fine to prevent multipathd crash.

Comment 24 Steven J. Levine 2016-10-13 20:26:29 UTC
Ben: For the release notes, I needed to add a summary statement to the doc text (and combine the rest of the text into a paragraph).  Could you review what I wrote here?  This is what I used as a summary sentence for the Bug Fix section of the 7.3 release notes:

The `multipathd` daemon no longer crashes when a uid_attribute is changed for a device after the device is added to a multipath device and then removed

The description is now as follows:

Previously, if a path changed its WWID after being added to a multipath device, the `multipathd` daemon would create a new device. This led to the path being in both devices. As a consequence, if users changed the `uid_attribute` after multipath devices were created and then removed the devices, `multipathd` would try to access freed memory and crash.  With this fix, `multipathd` no longer allows the path's WWID to be changed while it is being used in a multipath device. As a result, `multipathd` no longer crashes in this scenario.

Comment 25 Ben Marzinski 2016-10-14 17:57:35 UTC
Looks great.

Comment 27 errata-xmlrpc 2016-11-04 08:19:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2536.html

Comment 28 Steven J. Levine 2016-11-21 17:32:18 UTC
Editing the first sentence to keep it under 120 characters. The full explanation is still in the doc text.