RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1169935 - Inconsistent behaviour between multipath and multipathd
Summary: Inconsistent behaviour between multipath and multipathd
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: device-mapper-multipath
Version: 7.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: 7.0
Assignee: Ben Marzinski
QA Contact: Lin Li
URL:
Whiteboard:
Depends On:
Blocks: 1248227
TreeView+ depends on / blocked
 
Reported: 2014-12-02 18:55 UTC by Harald Hoyer
Modified: 2023-03-08 07:27 UTC (History)
28 users (show)

Fixed In Version: device-mapper-multipath-0.4.9-78.el7
Doc Type: Bug Fix
Doc Text:
Cause: When multipath doesn't already have a device's WWID in its wwids file, it will not claim the device immediately upon seeing it. However, it may claim the device later. Consequence: During boot, this can cause multipath to claim a device after it has started to be used directly by the machine, which can keep if from being completely usable either through multipath or directly. Fix: multipath now has a new multipath.conf configuration option, "ignore_new_boot_devs". If this is set, in the initramfs, multipath will never set itself up on a device that doesn't already have it's WWID in the wwids file Result: devices whose WWID is not listed in the wwids file will always be usable directly in the initramfs.
Clone Of: 1167620
: 1248227 (view as bug list)
Environment:
Last Closed: 2015-11-19 12:56:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:2132 0 normal SHIPPED_LIVE device-mapper-multipath bug fix and enhancement update 2015-11-19 11:21:43 UTC

Comment 2 Harald Hoyer 2014-12-02 19:03:31 UTC
To reproduce:

$ wget -c 'http://download.devel.redhat.com/brewroot/work/tasks/5462/8315462/rhev-hypervisor7-7.0-20141201.0.iso'

$ sudo qemu-kvm  -snapshot -m 2048 -smp 4 -hda ~/Downloads/rhev-hypervisor7-7.0-20141201.0.iso 

press <tab> in qemu and add to the kernel command line.

"rd.break=initqueue rd.udev.log-priority=debug"

You will be dropped to the dracut shell.

journalctl will show you

Dec 02 18:58:12 localhost systemd-udevd[373]: PROGRAM '/sbin/multipath -c /dev/sda' /usr/lib/udev/rules.d/62-multipath.rules:15
Dec 02 18:58:12 localhost systemd-udevd[428]: starting '/sbin/multipath -c /dev/sda'
Dec 02 18:58:12 localhost systemd-udevd[373]: '/sbin/multipath -c /dev/sda'(err) 'libudev: udev_device_new_from_syspath: device 0xc319c0 has devpath '/devices/pci0000:00/0000:00:01.1/ata1/host0/target0:0:0/0:0:0:0/block/sda''
Dec 02 18:58:12 localhost systemd-udevd[373]: '/sbin/multipath -c /dev/sda'(err) 'libudev: udev_device_new_from_syspath: device 0xc32940 has devpath '/devices/pci0000:00/0000:00:01.1/ata1/host0/target0:0:0/0:0:0:0''
Dec 02 18:58:12 localhost systemd-udevd[373]: '/sbin/multipath -c /dev/sda'(err) 'libudev: udev_device_new_from_syspath: device 0xc332f0 has devpath '/devices/pci0000:00/0000:00:01.1/ata1/host0/target0:0:0''
Dec 02 18:58:12 localhost systemd-udevd[373]: '/sbin/multipath -c /dev/sda'(err) 'libudev: udev_device_new_from_syspath: device 0xc33b70 has devpath '/devices/pci0000:00/0000:00:01.1/ata1/host0''
Dec 02 18:58:12 localhost systemd-udevd[373]: '/sbin/multipath -c /dev/sda'(err) 'libudev: udev_device_new_from_syspath: device 0xc34180 has devpath '/devices/pci0000:00/0000:00:01.1/ata1''
Dec 02 18:58:12 localhost systemd-udevd[373]: '/sbin/multipath -c /dev/sda'(err) 'libudev: udev_device_new_from_syspath: device 0xc346a0 has devpath '/devices/pci0000:00/0000:00:01.1''
Dec 02 18:58:12 localhost systemd-udevd[373]: '/sbin/multipath -c /dev/sda'(err) 'libudev: udev_device_new_from_syspath: device 0xc34bb0 has devpath '/devices/pci0000:00''
Dec 02 18:58:12 localhost systemd-udevd[373]: '/sbin/multipath -c /dev/sda'(err) 'libudev: udev_device_read_db: no db file to read /run/udev/data/b8:0: No such file or directory'
Dec 02 18:58:12 localhost systemd-udevd[373]: '/sbin/multipath -c /dev/sda'(out) '/dev/sda is not a valid multipath device path'
Dec 02 18:58:12 localhost systemd-udevd[373]: '/sbin/multipath -c /dev/sda' [428] exit with return code 1


and a little bit later on:

Dec 02 18:58:12 localhost multipathd[131]: path checkers start up
Dec 02 18:58:12 localhost multipathd[131]: sda: add path (uevent)
Dec 02 18:58:12 localhost multipathd[131]: QEMU_HARDDISK_QM00001: load table [0 501760 multipath 0 0 1 1 service-time 0 1 1 8:0 1]
Dec 02 18:58:12 localhost multipathd[131]: QEMU_HARDDISK_QM00001: event checker started
Dec 02 18:58:12 localhost multipathd[131]: sda [8:0]: path added to devmap QEMU_HARDDISK_QM00001


... so "multipath -c" thinks sda is not a mpath_member, but it is added by multipathd.

Comment 3 Ben Marzinski 2014-12-02 19:43:31 UTC
I don't think this the issue. First off, multipath will failback to getting the information from the environment:

        value = udev_device_get_property_value(pp->udev, pp->uid_attribute);
        if ((!value || strlen(value) == 0) && conf->cmd == CMD_VALID_PATH)
                value = getenv(pp->uid_attribute);

"multipath -c" sets conf->cmd to CMD_VALID_PATH

Second, there is nothing at all inconsistent between multipath -c thinking sda is not a multipath device, and it being added by multipathd.  sda will not be counted as a valid path until it's WWID is in /etc/mutipath/wwids. This doesn't happen until after the multipath device is created by multipathd, and that can't happen until after multipathd adds the path.

Multipathd adds every path that is not explicitly blacklisted.  With find_multipaths enabled, it will wait until two paths to the same device show up unless the wwid is in /etc/multipath/wwids.  Otherwise, it will immediately create a new multipath device with the path.  After the device has been successfully created, it will add the wwid to the wwids file.  After that, any events for the path device will set it as a multipath device.  But multipath won't preemptively claim a device that it hasn't seen before, and /etc/multipath/wwids stores the devices it has seen before.  Without the wwid in that file in the initramfs, multipath is likely to lose this race.

Comment 4 Fabian Deutsch 2014-12-02 20:11:20 UTC
Ben, is there a way to get rid of the race condition without a custom wwids file?

Comment 5 Harald Hoyer 2014-12-02 20:15:37 UTC
So, what we try to prevent, is the processing 60-persistent-storage.rules for the partitions of a multipath member, so the /dev/disk/by-* symlinks never show up.

What we see is a race condition.

We react on /dev/disk/by-label/mylabel -> sda1

But in the meantime multipathd has taken sda, and the partitions have not yet been removed with partx, which I thought is the job of the 62-multipath.rules.

The symlinks still point to sda1, which we can't use!

What exactly is 62-multipath.rules for then?

This is a stupid racy concept!

Comment 6 Harald Hoyer 2014-12-02 20:25:07 UTC
So we have a daemon which uses a blacklist to generate a whitelist for the tool in the udev rule???? 

That ACTION=="add" event is never processed, if the whitelist does not contain the wwid the blacklist allowed...

Comment 7 Ben Marzinski 2014-12-02 20:40:03 UTC
Unlike things like LVM or MD which have a label, there is no label for multipath devices.  There is no way to know the first time we see a device if it can be multipathed.  Once we have have seen it, we record this, and from then on, we can grab it right away.  This is the point of 62-multipath.rules. I've already admitted that dealing with the partitions with partx is racey, but that's not the issue here. The device is correctly labelled immediately.  LVM, for instance, respects the labelling, and ignores the devices.  Your patch for 1167620 would does as well, and would work, if that device was in the wwids file.

I'm still trying to figure out why the device is not in the wwids file.  How was this initramfs file generated?  anaconda should make sure the wwids file is present if the device was multipathed during installation.

If you need to force multipath to recognize a wwid, you can always add

mpath.wwid=<WWID>

to the kernel commandline.  Multipathd will pick that up when it starts and immediately add the wwid to /etc/multipath/wwids

Comment 8 Ben Marzinski 2014-12-02 20:43:46 UTC
The alternative is to have multipath claim every block device that isn't blacklisted, and that won't be any better without users being very careful about their blacklists.  This would certainly not be doable without manual configuration on all the machines.

Comment 9 Fabian Deutsch 2014-12-02 22:10:34 UTC
(In reply to Ben Marzinski from comment #7)
…

> I'm still trying to figure out why the device is not in the wwids file.  How
> was this initramfs file generated?  anaconda should make sure the wwids file
> is present if the device was multipathed during installation.

We are not using anaconda to do the installation.

IIUIC the manual workaround is to use mpath.wwid=<WWID_OF_ROOTFS_DEVICE> to let mpath know that the device holding the rootfs is multipathed?


I wonder if there is a mechanism how we can synchronize between the udev _and_ multipath discovery part and the part where we finally boot from the rootfs. To ensure that all (possible) mpath devices are discovered and the correct symlinks are created. Basically similar to barriers in OpenCL terminology.

Comment 10 Ben Marzinski 2014-12-02 23:47:46 UTC
Like I mentioned in Bug 1167620
https://bugzilla.redhat.com/show_bug.cgi?id=1167620#c19

A big part of the problem is that the multipath.conf file in your initramfs doesn't have

defaults {
        find_multipaths yes
}

Like RHEL7 does.

This means that multipathd will create multipath devices on top of all non-blacklisted devices.  This is why you are getting multipath on top of USB devices.  Changing that file should make this a lot better.  You might still run into this issue if you are actually using a device with multiple paths. To solve that, I can make multipathd not create multipath devices on any paths whose wwid isn't already in the wwids file.  This would fix the problem, but you wouldn't get a multipathed root device.  To get that you would need to manually add the wwid and remake the initramfs after you booted up.

Making multipath get a multipathed root device automatically with a stock setup will be tricky. The issue is that with find_multipaths enabled, multipathd will not create a multipath device until it gets a uevent for the second path.  This means that it will almost always lose the race against something else trying to grab the first device.  You could make it always win by adding:

mpath.wwid=<WWID>

to the kernel command line, to force it to claim these devices immediately. But this requires knowing the wwid of the device, and manually changing the kernel command line, so it's not a stock setup either.  The only way to do it with a stock setup would be to actually wait for multipathd to set up the devices.

Comment 11 Ben Marzinski 2014-12-03 07:36:02 UTC
I've built device-mapper-multipath packages that allow multipathd to be started with "-n".  When this happens, multipath won't create multipath devices if they aren't listed in /etc/multipath/wwids.  The multipathd.service file in the initramfs also needs to change

ExecStart=/sbin/multipathd

to

ExecStart=/sbin/multipathd -n

to enable this.  Like I mentioned before, this should avoid the race by making multipath not even try in cases where the device isn't in the wwids file.  If you wanted multipathed root, you would need to boot up, and then run

# multipath -a <root_devname>

to add the root device wwid to the wwids file. Then you would have to remake the initramfs to pull in the updated wwids file.

It should be possible to create some way to wait for multipathd to finish processing all the devices, assuming you knew when the last device got discovered (multipath won't know if a device it has never seen before can be multipathed until the second path shows up, so it's enough to know that it has completely processed the first path. That's the really tricky part).  However, I don't think that would be a quick fix.

Comment 12 Harald Hoyer 2014-12-03 12:03:14 UTC
(In reply to Ben Marzinski from comment #11)

> It should be possible to create some way to wait for multipathd to finish
> processing all the devices, assuming you knew when the last device got
> discovered (multipath won't know if a device it has never seen before can be
> multipathed until the second path shows up, so it's enough to know that it
> has completely processed the first path. That's the really tricky part). 
> However, I don't think that would be a quick fix.

Because there is no point in time, where we can say "last device got discovered", this will never work.

Comment 13 Ben Marzinski 2014-12-03 15:35:29 UTC
(In reply to Ben Marzinski from comment #11)
> discovered (multipath won't know if a device it has never seen before can be
> multipathed until the second path shows up, so it's enough to know that it
> has completely processed the first path. That's the really tricky part). 

I meant to say  "... so it is NOT enough to know that it has completely processed the first path." Sort of an important typo there.

So between adding the wwid on the kernel command line so that multipath always immediately claims the device, and running "multipathd -n" in the initramfs, so that multipath doesn't even try on devices that it doesn't recognize, we can
keep from running this race at all. The only case where this isn't optimal is when you are booting on a multipathed device with a stock initramfs image, and you don't (or can't) add the wwid on the kernel command line.  In this case you can fix the issue after bootup, if you are able to remake the initramfs or edit the kernel command line.

Fabian, does that seem good enough?

Harald, how do I go about making the multipathd.service file in the initramfs different from the one on the real filesystem (to run multipathd with -n in the initramfs)?  The other option would be to have multipathd detect that it is running in the initramfs, but I'm not sure how to detect that either.

Comment 15 Harald Hoyer 2014-12-08 15:14:19 UTC
(In reply to Ben Marzinski from comment #13)
> Harald, how do I go about making the multipathd.service file in the
> initramfs different from the one on the real filesystem (to run multipathd
> with -n in the initramfs)?  The other option would be to have multipathd
> detect that it is running in the initramfs, but I'm not sure how to detect
> that either.

http://cgit.freedesktop.org/systemd/systemd/tree/src/shared/util.c#n5467

Comment 17 Ben Marzinski 2014-12-17 03:29:17 UTC
Fabian, does the configuration change to use "find_multipaths" solve this issue for you.  I'm concerned that having multipath always ignore unknown devices in the initramfs will make a known anaconda issue more common, and I'd like to wait until that issue is resolved before adding this.  Without multipath always ignoring unknown devices, there is theoretically a possibility that both paths will appear, and multipath will get set up on the device before something else can, but in practice I have never seen multipath win this race.  Also, this is only possible in the case where there are actually multiple paths to the device, and the mpath.wwid was not set in kernel commandline.  So if you plan to always set mpath.wwid for multipath installations, this will never be a problem for you.

If you think this is necessary, I can make this functionality able to be configured in /etc/multipath.conf

Comment 18 Fabian Deutsch 2015-01-07 13:38:16 UTC
Yes, using find_multipaths solves many issues. I think we clarified many points on IRC and in other bugs and conversations.

But I can not tell if the original bug described in comment 1 is valid or not.

Comment 19 Ben Marzinski 2015-01-08 02:17:44 UTC
Like I mentioned, if you always set mpath.wwid when you have multipath devices, then you should never hit this, because multipath will always claim the device immediately.  I do have a patch that adds a new config option "ignore_new_boot_devs", which, if set, will make multipath only set up on devices that are already in the wwids file while it's running in the initramfs.  If this bugzilla isn't needing for tracking anything else, I plan on using it for adding that.  Even if RHEV doesn't need it, I can see situations where it would be useful.

Comment 20 Fabian Deutsch 2015-01-27 11:44:30 UTC
Right, feel free to use this bug for the new "ignore_new_boot_devs" option, otherwise I'm fine to close this bug.

Comment 23 Ben Marzinski 2015-03-30 19:28:51 UTC
I will be using this bz to add an "ignore_new_boot_devs" option

Comment 32 Lin Li 2015-09-25 03:16:12 UTC
Verified on DISTRO: RHEL-7.2-20150917.0
kernel:3.10.0-316.el7.x86_64
multipath: device-mapper-multipath-0.4.9-84.el7

change to verified.

Comment 33 errata-xmlrpc 2015-11-19 12:56:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2132.html


Note You need to log in before you can comment on or make changes to this bug.