Bug 676123 - multipath returns before all device nodes exist
Summary: multipath returns before all device nodes exist
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: device-mapper-multipath
Version: 5.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: rc
: ---
Assignee: Ben Marzinski
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-02-08 21:29 UTC by James Ralston
Modified: 2018-11-28 20:29 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-02-12 17:42:08 UTC
Target Upstream Version:


Attachments (Terms of Use)
add a delay to rc.sysinit to work around the bug (697 bytes, patch)
2011-02-08 21:30 UTC, James Ralston
no flags Details | Diff

Description James Ralston 2011-02-08 21:29:18 UTC
We have multiple hosts with QLogic ISP4032 iSCSI cards in them. For each host, we configure multiple paths to the SAN, and then use multipath to present a single logical device.  E.g.:

mpath1 (xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx) dm-9 EQLOGIC,100E-00
[size=8.0T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=0][active]
 \_ 1:0:2:0 sdb 8:16  [active][undef]
 \_ 2:0:2:0 sdc 8:32  [active][undef]

We then create a logical volume group on the multipath device (/dev/dm-9, in this case). This has worked fine for (literally) years.

But just recently, our hosts that run multipath have begun to fail to reboot. This happens because the device nodes for the multipath-backed filesystems haven't been created yet when rc.sysinit calls lvm.static:

Setting up Logical Volume Management:   File-based locking initialisation failed
.
  Couldn't find device with uuid xxxxxx-xxxx-xxxx-xxxx-xxxx-xxxx-xxxxxx.
  Refusing activation of partial LV f0. Use --partial to override.
  4 logical volume(s) in volume group "example0" now active
  6 logical volume(s) in volume group "example1" now active
[FAILED]

This causes fsck to bomb out to single-user mode.

Through experimentation, we have discovered that if we add a delay to rc.sysinit between the call to /sbin/multipath.static and the call to /sbin/lvm.static, then the multipath device nodes will be present when /sbin/lvm.static runs, which means that all volume groups will initialize properly, and the boot will not fail.

So clearly, something has recently changed in the behavior of multipath, udev, and/or the kernel. It's an unfortunate interaction issue, but since it prevents hosts that have filesystems backed by multipath from successfully rebooting, this problem needs to be corrected immediately.

Versions:

0:device-mapper-1.02.55-2.el5.x86_64
0:device-mapper-event-1.02.55-2.el5.x86_64
0:device-mapper-multipath-0.4.7-42.el5.x86_64
0:kernel-2.6.18-238.1.1.el5.x86_64
0:udev-095-14.24.el5.x86_64

Comment 1 James Ralston 2011-02-08 21:30:33 UTC
Created attachment 477694 [details]
add a delay to rc.sysinit to work around the bug

This is how we've worked around the problem for now.

Comment 2 James Ralston 2011-02-08 21:39:05 UTC
Cross-filed as Red Hat Support Case 00416066.

Comment 4 Alasdair Kergon 2011-02-08 23:46:41 UTC
"We then create a logical volume group on the multipath device (/dev/dm-9, in
this case)."

Is there a reason you're using 'dm-9' (under async control of udev) there rather than /dev/mpath/<something> (under synchronous multipath control) ?

Comment 5 Alasdair Kergon 2011-02-08 23:53:58 UTC
Could try 'udevsettle' instead of the sleep, or check for any refs (e.g. in lvm.conf filters) to /dev/dm-* and replace with /dev/mapper or /dev/mpath.

Comment 6 Bryn M. Reeves 2011-02-09 10:51:44 UTC
The /dev/mpath symlinks are also controlled by udev in RHEL5. Afaik only the /dev/mapper device nodes are created synchronously.

What filters are in use in lvm.conf?

Comment 7 James Ralston 2011-02-09 17:59:38 UTC
This is the typical lvm.conf filter we use on our multipath hosts:

filter = [ "a|/dev/sda|", "a|^/dev/dm|", "r/.*/" ]

The reason why it's so restrictive is because we can't let LVM glom onto the block devices that correspond to the physical paths to the SAN; we need LVM to find only the multipath block device.

But based on your comments, what it sounds like is that udev is responsible for (asynchronously) creating the /dev/dm* devices, after multipath (synchronously) creates the /dev/mapper/mpath* devices. Which means that we can eliminate the race condition in rc.sysinit by changing our lvm.conf filter specification to:

filter = [ "a|/dev/sda|", "a|^/dev/mapper/|", "r/.*/" ]

Have I understood things correctly?

Comment 8 Bryn M. Reeves 2011-02-09 18:19:14 UTC
Exactly; this is the filter style that I normally recommend for environments like yours.

"Accept the things you explicitly want to be used as PVs and reject everything else."

You can use udevsettle here if you want to sync up with the creation of the dm-N nodes but I don't personally recommend it - allowing LVM2 to scan into /dev/mapper and use those nodes is more robust because they are created synchronously wrt the devices.

Comment 9 James Ralston 2011-02-09 20:49:27 UTC
Confirmed; the system boots as expected (without needing sleep/udevsync) if LVM looks for /dev/mapper/* nodes instead of /dev/dm-* nodes.

As it turns out, udev creates /dev/dm-9 only ~1.2 seconds after multipath creates /dev/mapper/mpath1:

$ stat /dev/mapper/mpath1 /dev/dm-9 | grep Change
Change: 2011-02-09 15:19:25.254970688 -0500
Change: 2011-02-09 15:19:26.489970688 -0500

...but as we've seen, rc.sysinit calls lvm quickly enough after calling multipath to beat that window.

I see now that the RHEL5 DM Multipath guide (section 2.1, 
"Multipath Device Identifiers") gives explicit advice to prefer the /dev/mapper/mpath* devices over /dev/mpath/mpath* and /dev/dm-*, explicitly for this reason. So, this is our fault for not following that advice. Thanks for setting us straight.

Please close as NOTABUG (or appropriate), as I don't seem to be able to close this ticket myself.


Note You need to log in before you can comment on or make changes to this bug.