Bug 1169935
| Summary: | Inconsistent behaviour between multipath and multipathd | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Harald Hoyer <harald> | |
| Component: | device-mapper-multipath | Assignee: | Ben Marzinski <bmarzins> | |
| Status: | CLOSED ERRATA | QA Contact: | Lin Li <lilin> | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | urgent | |||
| Version: | 7.0 | CC: | agk, bmarzins, bmcclain, cshao, ecohen, fdeutsch, gklein, harald, heinzm, huiwa, iheim, jbrassow, leiwang, lilin, lilu, lsurette, msnitzer, nbarcet, prajnoha, qe-baseos-daemons, rbarry, snagar, systemd-maint-list, virt-bugs, yaniwang, yanwang, ycui, zkabelac | |
| Target Milestone: | rc | Keywords: | Triaged | |
| Target Release: | 7.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | device-mapper-multipath-0.4.9-78.el7 | Doc Type: | Bug Fix | |
| Doc Text: |
Cause: When multipath doesn't already have a device's WWID in its wwids file, it will not claim the device immediately upon seeing it. However, it may claim the device later.
Consequence: During boot, this can cause multipath to claim a device after it has started to be used directly by the machine, which can keep if from being completely usable either through multipath or directly.
Fix: multipath now has a new multipath.conf configuration option, "ignore_new_boot_devs". If this is set, in the initramfs, multipath will never set itself up on a device that doesn't already have it's WWID in the wwids file
Result: devices whose WWID is not listed in the wwids file will always be usable directly in the initramfs.
|
Story Points: | --- | |
| Clone Of: | 1167620 | |||
| : | 1248227 (view as bug list) | Environment: | ||
| Last Closed: | 2015-11-19 12:56:16 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1248227 | |||
|
Comment 2
Harald Hoyer
2014-12-02 19:03:31 UTC
I don't think this the issue. First off, multipath will failback to getting the information from the environment:
value = udev_device_get_property_value(pp->udev, pp->uid_attribute);
if ((!value || strlen(value) == 0) && conf->cmd == CMD_VALID_PATH)
value = getenv(pp->uid_attribute);
"multipath -c" sets conf->cmd to CMD_VALID_PATH
Second, there is nothing at all inconsistent between multipath -c thinking sda is not a multipath device, and it being added by multipathd. sda will not be counted as a valid path until it's WWID is in /etc/mutipath/wwids. This doesn't happen until after the multipath device is created by multipathd, and that can't happen until after multipathd adds the path.
Multipathd adds every path that is not explicitly blacklisted. With find_multipaths enabled, it will wait until two paths to the same device show up unless the wwid is in /etc/multipath/wwids. Otherwise, it will immediately create a new multipath device with the path. After the device has been successfully created, it will add the wwid to the wwids file. After that, any events for the path device will set it as a multipath device. But multipath won't preemptively claim a device that it hasn't seen before, and /etc/multipath/wwids stores the devices it has seen before. Without the wwid in that file in the initramfs, multipath is likely to lose this race.
Ben, is there a way to get rid of the race condition without a custom wwids file? So, what we try to prevent, is the processing 60-persistent-storage.rules for the partitions of a multipath member, so the /dev/disk/by-* symlinks never show up. What we see is a race condition. We react on /dev/disk/by-label/mylabel -> sda1 But in the meantime multipathd has taken sda, and the partitions have not yet been removed with partx, which I thought is the job of the 62-multipath.rules. The symlinks still point to sda1, which we can't use! What exactly is 62-multipath.rules for then? This is a stupid racy concept! So we have a daemon which uses a blacklist to generate a whitelist for the tool in the udev rule???? That ACTION=="add" event is never processed, if the whitelist does not contain the wwid the blacklist allowed... Unlike things like LVM or MD which have a label, there is no label for multipath devices. There is no way to know the first time we see a device if it can be multipathed. Once we have have seen it, we record this, and from then on, we can grab it right away. This is the point of 62-multipath.rules. I've already admitted that dealing with the partitions with partx is racey, but that's not the issue here. The device is correctly labelled immediately. LVM, for instance, respects the labelling, and ignores the devices. Your patch for 1167620 would does as well, and would work, if that device was in the wwids file. I'm still trying to figure out why the device is not in the wwids file. How was this initramfs file generated? anaconda should make sure the wwids file is present if the device was multipathed during installation. If you need to force multipath to recognize a wwid, you can always add mpath.wwid=<WWID> to the kernel commandline. Multipathd will pick that up when it starts and immediately add the wwid to /etc/multipath/wwids The alternative is to have multipath claim every block device that isn't blacklisted, and that won't be any better without users being very careful about their blacklists. This would certainly not be doable without manual configuration on all the machines. (In reply to Ben Marzinski from comment #7) … > I'm still trying to figure out why the device is not in the wwids file. How > was this initramfs file generated? anaconda should make sure the wwids file > is present if the device was multipathed during installation. We are not using anaconda to do the installation. IIUIC the manual workaround is to use mpath.wwid=<WWID_OF_ROOTFS_DEVICE> to let mpath know that the device holding the rootfs is multipathed? I wonder if there is a mechanism how we can synchronize between the udev _and_ multipath discovery part and the part where we finally boot from the rootfs. To ensure that all (possible) mpath devices are discovered and the correct symlinks are created. Basically similar to barriers in OpenCL terminology. Like I mentioned in Bug 1167620 https://bugzilla.redhat.com/show_bug.cgi?id=1167620#c19 A big part of the problem is that the multipath.conf file in your initramfs doesn't have defaults { find_multipaths yes } Like RHEL7 does. This means that multipathd will create multipath devices on top of all non-blacklisted devices. This is why you are getting multipath on top of USB devices. Changing that file should make this a lot better. You might still run into this issue if you are actually using a device with multiple paths. To solve that, I can make multipathd not create multipath devices on any paths whose wwid isn't already in the wwids file. This would fix the problem, but you wouldn't get a multipathed root device. To get that you would need to manually add the wwid and remake the initramfs after you booted up. Making multipath get a multipathed root device automatically with a stock setup will be tricky. The issue is that with find_multipaths enabled, multipathd will not create a multipath device until it gets a uevent for the second path. This means that it will almost always lose the race against something else trying to grab the first device. You could make it always win by adding: mpath.wwid=<WWID> to the kernel command line, to force it to claim these devices immediately. But this requires knowing the wwid of the device, and manually changing the kernel command line, so it's not a stock setup either. The only way to do it with a stock setup would be to actually wait for multipathd to set up the devices. I've built device-mapper-multipath packages that allow multipathd to be started with "-n". When this happens, multipath won't create multipath devices if they aren't listed in /etc/multipath/wwids. The multipathd.service file in the initramfs also needs to change ExecStart=/sbin/multipathd to ExecStart=/sbin/multipathd -n to enable this. Like I mentioned before, this should avoid the race by making multipath not even try in cases where the device isn't in the wwids file. If you wanted multipathed root, you would need to boot up, and then run # multipath -a <root_devname> to add the root device wwid to the wwids file. Then you would have to remake the initramfs to pull in the updated wwids file. It should be possible to create some way to wait for multipathd to finish processing all the devices, assuming you knew when the last device got discovered (multipath won't know if a device it has never seen before can be multipathed until the second path shows up, so it's enough to know that it has completely processed the first path. That's the really tricky part). However, I don't think that would be a quick fix. (In reply to Ben Marzinski from comment #11) > It should be possible to create some way to wait for multipathd to finish > processing all the devices, assuming you knew when the last device got > discovered (multipath won't know if a device it has never seen before can be > multipathed until the second path shows up, so it's enough to know that it > has completely processed the first path. That's the really tricky part). > However, I don't think that would be a quick fix. Because there is no point in time, where we can say "last device got discovered", this will never work. (In reply to Ben Marzinski from comment #11) > discovered (multipath won't know if a device it has never seen before can be > multipathed until the second path shows up, so it's enough to know that it > has completely processed the first path. That's the really tricky part). I meant to say "... so it is NOT enough to know that it has completely processed the first path." Sort of an important typo there. So between adding the wwid on the kernel command line so that multipath always immediately claims the device, and running "multipathd -n" in the initramfs, so that multipath doesn't even try on devices that it doesn't recognize, we can keep from running this race at all. The only case where this isn't optimal is when you are booting on a multipathed device with a stock initramfs image, and you don't (or can't) add the wwid on the kernel command line. In this case you can fix the issue after bootup, if you are able to remake the initramfs or edit the kernel command line. Fabian, does that seem good enough? Harald, how do I go about making the multipathd.service file in the initramfs different from the one on the real filesystem (to run multipathd with -n in the initramfs)? The other option would be to have multipathd detect that it is running in the initramfs, but I'm not sure how to detect that either. (In reply to Ben Marzinski from comment #13) > Harald, how do I go about making the multipathd.service file in the > initramfs different from the one on the real filesystem (to run multipathd > with -n in the initramfs)? The other option would be to have multipathd > detect that it is running in the initramfs, but I'm not sure how to detect > that either. http://cgit.freedesktop.org/systemd/systemd/tree/src/shared/util.c#n5467 Fabian, does the configuration change to use "find_multipaths" solve this issue for you. I'm concerned that having multipath always ignore unknown devices in the initramfs will make a known anaconda issue more common, and I'd like to wait until that issue is resolved before adding this. Without multipath always ignoring unknown devices, there is theoretically a possibility that both paths will appear, and multipath will get set up on the device before something else can, but in practice I have never seen multipath win this race. Also, this is only possible in the case where there are actually multiple paths to the device, and the mpath.wwid was not set in kernel commandline. So if you plan to always set mpath.wwid for multipath installations, this will never be a problem for you. If you think this is necessary, I can make this functionality able to be configured in /etc/multipath.conf Yes, using find_multipaths solves many issues. I think we clarified many points on IRC and in other bugs and conversations. But I can not tell if the original bug described in comment 1 is valid or not. Like I mentioned, if you always set mpath.wwid when you have multipath devices, then you should never hit this, because multipath will always claim the device immediately. I do have a patch that adds a new config option "ignore_new_boot_devs", which, if set, will make multipath only set up on devices that are already in the wwids file while it's running in the initramfs. If this bugzilla isn't needing for tracking anything else, I plan on using it for adding that. Even if RHEV doesn't need it, I can see situations where it would be useful. Right, feel free to use this bug for the new "ignore_new_boot_devs" option, otherwise I'm fine to close this bug. I will be using this bz to add an "ignore_new_boot_devs" option Verified on DISTRO: RHEL-7.2-20150917.0 kernel:3.10.0-316.el7.x86_64 multipath: device-mapper-multipath-0.4.9-84.el7 change to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-2132.html |