Bug 1167620 - [RHEV-H 7] Failed to boot and report "Dracut: FATAL:Failed to mount block device of live image, System halted" / Race between device-mapper-multipath and udev
Summary: [RHEV-H 7] Failed to boot and report "Dracut: FATAL:Failed to mount block dev...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: systemd
Version: 7.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: 7.0
Assignee: Harald Hoyer
QA Contact: qe-baseos-daemons
URL:
Whiteboard:
Depends On: 1152948
Blocks: 784395 rhevh-7.0 1155957
TreeView+ depends on / blocked
 
Reported: 2014-11-25 07:52 UTC by Fabian Deutsch
Modified: 2015-07-01 10:46 UTC (History)
31 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 1152948
: 1169935 (view as bug list)
Environment:
Last Closed: 2015-07-01 10:46:21 UTC
Target Upstream Version:


Attachments (Terms of Use)
Proposed patch (1.02 KB, patch)
2014-12-01 13:35 UTC, Harald Hoyer
no flags Details | Diff
Proposed patch (1.02 KB, patch)
2014-12-01 13:37 UTC, Harald Hoyer
no flags Details | Diff

Comment 1 Fabian Deutsch 2014-11-26 14:06:44 UTC
I am proposing this as a blocker, because this is mandatory to get the RHEV-H 7 installation routines working.

The workaround is to simply disable multipath in initrd, but this is obviously not an option.
And because we can not disable multipath in initrd, we obviously need to fix the races.

Comment 4 Ben Marzinski 2014-11-26 16:01:22 UTC
Multipath should be labelling these devices as claimed as soon as the device is discovered by the udev rules.

The multipath rule: /lib/udev/rules.d/62-multipath.rules

should be setting

ENV{DM_MULTIPATH_DEVICE_PATH}="1", ENV{ID_FS_TYPE}="mpath_member"

on the devices.  If the boot logic could ignore these devices, then multipath should be able to create them.  If you didn't want to have to wait an indeterminate amount of time for the creation, you could simply run

multipathd add path <devname>

Whn you saw a device was claimed by multipath.  This would guarantee that multiptahd saw the device, and created an appropriate multipath device for it.

Actually changing how multipath creates devices (which has been this way since RHEL5) seems like it runs the risk of causing more problems then it solves in the near term.

Comment 5 Ben Marzinski 2014-11-26 16:54:22 UTC
Would it be possible to post the initramfs file for me to take a look at? Assuming that the wwid for this device is in /etc/multipath/wwids in the initramfs, the partitions should be getting removed from the device by /lib/udev/rules.d/62-multipath.rules.

ENV{DM_MULTIPATH_DEVICE_PATH}=="1", ENV{DM_MULTIPATH_WIPE_PARTS}="1", \
        RUN+="/sbin/partx -d --nr 1-1024 $env{DEVNAME}"

I realize that the because the partx command isn't run immediately, the partition devices could still get created before partx removes them, and that should probably be fixed.  But, I'd like to verify that the wwid really is in the wwids file.

Comment 6 Ryan Barry 2014-11-26 17:38:37 UTC
You can grab it here:

http://rbarry.org/initrd0.img

Comment 7 Fabian Deutsch 2014-11-27 14:18:06 UTC
IIUIC, but Harald may correct me, then the problem is currently that:

1. The host comes up rootfs=live:LABEL=Rootfs
   So we are looking for a fs labeled Rootfs
2. Raw device is discovered, symlink created
3. dracut continues
4. Dracut chooses fs on raw device partion for boot
5. multipath kicks in (via udev) and starts "owning" the device
   no futher access to the raw device or it's partitions are possible anymore
6. dracut fails to switch rootfs because it want's to use the raw device, and 
   not the newly constructed mpath device

I can imagine that it was like this for a long time.
But we are seeing more and more problems since RHEL 6.5 now.

Comment 8 Harald Hoyer 2014-11-28 09:56:29 UTC
(In reply to Fabian Deutsch from comment #7)
> IIUIC, but Harald may correct me, then the problem is currently that:
> 
> 1. The host comes up rootfs=live:LABEL=Rootfs
>    So we are looking for a fs labeled Rootfs
> 2. Raw device is discovered, symlink created
> 3. dracut continues
> 4. Dracut chooses fs on raw device partion for boot
> 5. multipath kicks in (via udev) and starts "owning" the device
>    no futher access to the raw device or it's partitions are possible anymore
> 6. dracut fails to switch rootfs because it want's to use the raw device,
> and 
>    not the newly constructed mpath device
> 
> I can imagine that it was like this for a long time.
> But we are seeing more and more problems since RHEL 6.5 now.

correct, even though:

RUN+="/sbin/partx -d --nr 1-1024 $env{DEVNAME}"

is run, it seems, that this is triggering new events too late. It probably should also prevent all the symlinks, that are filesystem based.

We might accomplish that by splitting 60-persistent-storage.rules into a "data gathering" and "symlink setting" part, so that 62-multipath.rules can exist inbetween and the setting of ENV{ID_FS_TYPE}="mpath_member" gets honored by the symlink phase, which reads:

ENV{ID_FS_USAGE}=="filesystem|other|crypto", ENV{ID_FS_UUID_ENC}=="?*", SYMLINK+="disk/by-uuid/$env{ID_FS_UUID_ENC}"
ENV{ID_FS_USAGE}=="filesystem|other", ENV{ID_FS_LABEL_ENC}=="?*", SYMLINK+="disk/by-label/$env{ID_FS_LABEL_ENC}"

Comment 9 Harald Hoyer 2014-12-01 13:27:56 UTC
Talked with Kay and we will add in 60-persistent-storage.rules, after:

# for partitions import parent information
ENV{DEVTYPE}=="partition", IMPORT{parent}="ID_*"

the line:

ENV{ID_FS_TYPE}="?*", GOTO="persistent_storage_end"

Because 62-multipath.rules set ENV{ID_FS_TYPE}="mpath_member", similar to "linux_raid_member" and "isw_raid_member" and "LVM2_member|LVM1_member", we can just ignore all the partitions of such a disk.

This will prevent the symlinks in /dev/disk/by* and we should be fine again.

Comment 10 Harald Hoyer 2014-12-01 13:35:11 UTC
Created attachment 963272 [details]
Proposed patch

Comment 11 Harald Hoyer 2014-12-01 13:37:59 UTC
Created attachment 963273 [details]
Proposed patch

Comment 13 Fabian Deutsch 2014-12-02 12:30:57 UTC
Harald, could you please take a look at bug 1155957 comment 8 and bug 1152948 comment 66.

It seems that the patch did not fix the issue.

Comment 14 Ben Marzinski 2014-12-02 17:46:47 UTC
Looking at the initramfs, the multipath device isn't listed in the wwids file.  This means that multipath will not immediately claim it.  Was this installed with this device multipathed?  If so, anaconda should have added the wwid.  If it was multipathed after installation, you need to remake your initramfs so it will pull the updated /etc/multipath/wwids file into the initramfs.  Once the initramfs wwids file includes the wwids of the multipath devices, this may just work without any changes to the udev rules.

Comment 15 Ryan Barry 2014-12-02 18:02:55 UTC
(In reply to Ben Marzinski from comment #14)
> Looking at the initramfs, the multipath device isn't listed in the wwids
> file.  This means that multipath will not immediately claim it.  Was this
> installed with this device multipathed?  If so, anaconda should have added
> the wwid.  If it was multipathed after installation, you need to remake your
> initramfs so it will pull the updated /etc/multipath/wwids file into the
> initramfs.  Once the initramfs wwids file includes the wwids of the
> multipath devices, this may just work without any changes to the udev rules.

RHEV-H directly lays down a squashfs image which is booted from, with some overlays/bind mounts for persistent data. Anaconda is not used for installation.

We also see this booting from USB media created by DD-ing an iso, where we would expect /etc/multipath/wwids to be empty even on a normal RHEL install, since no installation has happened yet, and /etc/multipath/wwids is empty on installation media.

Comment 16 Harald Hoyer 2014-12-02 18:17:26 UTC
ok, I see:

multipath -c /dev/sda
…
libudev: udev_device_read_db: no db file to read /run/udev/data/b8:0
…
/dev/sda is a not a valid multipath device path
…

BUT!!!!!

multipathd[132]: sda add path (uevent)


So, the 62-multipath.rules rule never kicks in.

ACTION=="add", ENV{DM_MULTIPATH_DEVICE_PATH}!="1", \
        PROGRAM=="$env{MPATH_SBIN_PATH}/multipath -c $tempnode", \
        ENV{DM_MULTIPATH_DEVICE_PATH}="1", ENV{ID_FS_TYPE}="mpath_member"

I can cat /run/udev/data/b8:0 and it contains all keys.

So, "multipath -c" runs before udevd stores the keys in the file. It maybe should use the environment for device properties, instead of reading from the db file.

  PROGRAM
      Execute a program to determine whether there is a match; the key is true if the program returns successfully. The device properties are made
      available to the executed program in the environment. The program's standard ouput is available in the RESULT key.

Comment 17 Harald Hoyer 2014-12-02 18:27:26 UTC
just add "rd.udev.log-priority=debug" to the kernel command line and see for yourself.

Comment 18 Harald Hoyer 2014-12-02 19:06:31 UTC
cloned the bug for the device-mapper-multipath part to bug 1169935

Comment 19 Ben Marzinski 2014-12-02 21:24:36 UTC
Ah... I see. The reason this doesn't happen in RHEL is that the multipath.conf file isn't empty there.

The multipath.conf file in your initramfs is

# cat etc/multipath.conf
#Use Defaults

It needs to at least have:

defaults {
        find_multipaths yes
}

The RHEL7 one has

defaults {
        find_multipaths yes
        user_friendly_names yes
}


The user_friendly_names option isn't necessary, but the way things currently work, find_multipaths is. Otherwise multipath will just grab every non-blacklisted device, which is why it is setting itself up on the USB device.  With this option set, multipath will only set itself up on devices that actually have multiple paths.

Comment 20 Ben Marzinski 2014-12-02 22:09:54 UTC
To deal with any issues if there are multipath paths, I could make an option for multipathd so that it didn't create multipath devices that it had never seen before initramfs; it only started ones that were already in the wwids file.  The question is what is the easiest way to do that.  Adding it as a command option would mean that multipathd would need to have different unit files for the initramfs and in the real filesystem.  The other option would be for multipathd to detect if it was running in the initramfs, and enable this behaviour then.  I'm not sure how it would detect that, however.

Comment 21 Fabian Deutsch 2014-12-03 19:46:33 UTC
(In reply to Ben Marzinski from comment #19)

…

> It needs to at least have:
> 
> defaults {
>         find_multipaths yes
> }
> 
> The RHEL7 one has
> 
> defaults {
>         find_multipaths yes
...
> }


This is the config I'll add to our initramfs for now.
I hope that this will solve the most pressing problems.

Comment 24 Ben Marzinski 2014-12-16 17:42:30 UTC
Fabian, have you had a chance to test this? If the config change fixes it, should we move this bug over to RHEV, or are you tracking that change in a different bugzilla?

Comment 29 Lukáš Nykrýn 2015-01-06 11:50:28 UTC
This need to be fixed in upstream and tested in rawhide first.

Comment 31 Fabian Deutsch 2015-01-07 13:47:54 UTC
Bottom line summary: While working on this bug we learned that the race between udev and multipath can not be solved _without_ specifying the wwid in either the kernel commandline (mpath.wwid) or in the wwids file (inside the initrd).

Specifying the wwid will let multipath directly claim any device when it comes up with that specific wwid, which means that multipath is winning the race.

In case that the wwid is not specified anywhere, we'll have a situation where multipath will claim a device whenever there is already a device with the same wwid. But in the meantime some other component (dm adding partitions) could have claimed the device, and the we have the problems which were described in the description.

Comment 34 Fabian Deutsch 2015-07-01 10:46:21 UTC
Closing this bug according to comment 31.

After all this can only be fixed if multipath knows the correct config at boot time, either through the kernel commandline arg mpath.wwid or by creating an initramfs which contains the correct wwids file.


Note You need to log in before you can comment on or make changes to this bug.