Hide Forgot
We have multiple hosts with QLogic ISP4032 iSCSI cards in them. For each host, we configure multiple paths to the SAN, and then use multipath to present a single logical device. E.g.: mpath1 (xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx) dm-9 EQLOGIC,100E-00 [size=8.0T][features=0][hwhandler=0][rw] \_ round-robin 0 [prio=0][active] \_ 1:0:2:0 sdb 8:16 [active][undef] \_ 2:0:2:0 sdc 8:32 [active][undef] We then create a logical volume group on the multipath device (/dev/dm-9, in this case). This has worked fine for (literally) years. But just recently, our hosts that run multipath have begun to fail to reboot. This happens because the device nodes for the multipath-backed filesystems haven't been created yet when rc.sysinit calls lvm.static: Setting up Logical Volume Management: File-based locking initialisation failed . Couldn't find device with uuid xxxxxx-xxxx-xxxx-xxxx-xxxx-xxxx-xxxxxx. Refusing activation of partial LV f0. Use --partial to override. 4 logical volume(s) in volume group "example0" now active 6 logical volume(s) in volume group "example1" now active [FAILED] This causes fsck to bomb out to single-user mode. Through experimentation, we have discovered that if we add a delay to rc.sysinit between the call to /sbin/multipath.static and the call to /sbin/lvm.static, then the multipath device nodes will be present when /sbin/lvm.static runs, which means that all volume groups will initialize properly, and the boot will not fail. So clearly, something has recently changed in the behavior of multipath, udev, and/or the kernel. It's an unfortunate interaction issue, but since it prevents hosts that have filesystems backed by multipath from successfully rebooting, this problem needs to be corrected immediately. Versions: 0:device-mapper-1.02.55-2.el5.x86_64 0:device-mapper-event-1.02.55-2.el5.x86_64 0:device-mapper-multipath-0.4.7-42.el5.x86_64 0:kernel-2.6.18-238.1.1.el5.x86_64 0:udev-095-14.24.el5.x86_64
Created attachment 477694 [details] add a delay to rc.sysinit to work around the bug This is how we've worked around the problem for now.
Cross-filed as Red Hat Support Case 00416066.
"We then create a logical volume group on the multipath device (/dev/dm-9, in this case)." Is there a reason you're using 'dm-9' (under async control of udev) there rather than /dev/mpath/<something> (under synchronous multipath control) ?
Could try 'udevsettle' instead of the sleep, or check for any refs (e.g. in lvm.conf filters) to /dev/dm-* and replace with /dev/mapper or /dev/mpath.
The /dev/mpath symlinks are also controlled by udev in RHEL5. Afaik only the /dev/mapper device nodes are created synchronously. What filters are in use in lvm.conf?
This is the typical lvm.conf filter we use on our multipath hosts: filter = [ "a|/dev/sda|", "a|^/dev/dm|", "r/.*/" ] The reason why it's so restrictive is because we can't let LVM glom onto the block devices that correspond to the physical paths to the SAN; we need LVM to find only the multipath block device. But based on your comments, what it sounds like is that udev is responsible for (asynchronously) creating the /dev/dm* devices, after multipath (synchronously) creates the /dev/mapper/mpath* devices. Which means that we can eliminate the race condition in rc.sysinit by changing our lvm.conf filter specification to: filter = [ "a|/dev/sda|", "a|^/dev/mapper/|", "r/.*/" ] Have I understood things correctly?
Exactly; this is the filter style that I normally recommend for environments like yours. "Accept the things you explicitly want to be used as PVs and reject everything else." You can use udevsettle here if you want to sync up with the creation of the dm-N nodes but I don't personally recommend it - allowing LVM2 to scan into /dev/mapper and use those nodes is more robust because they are created synchronously wrt the devices.
Confirmed; the system boots as expected (without needing sleep/udevsync) if LVM looks for /dev/mapper/* nodes instead of /dev/dm-* nodes. As it turns out, udev creates /dev/dm-9 only ~1.2 seconds after multipath creates /dev/mapper/mpath1: $ stat /dev/mapper/mpath1 /dev/dm-9 | grep Change Change: 2011-02-09 15:19:25.254970688 -0500 Change: 2011-02-09 15:19:26.489970688 -0500 ...but as we've seen, rc.sysinit calls lvm quickly enough after calling multipath to beat that window. I see now that the RHEL5 DM Multipath guide (section 2.1, "Multipath Device Identifiers") gives explicit advice to prefer the /dev/mapper/mpath* devices over /dev/mpath/mpath* and /dev/dm-*, explicitly for this reason. So, this is our fault for not following that advice. Thanks for setting us straight. Please close as NOTABUG (or appropriate), as I don't seem to be able to close this ticket myself.