Red Hat Bugzilla – Bug 220643
timing issue with EMC Clariion
Last modified: 2012-06-04 15:47:53 EDT
Description of problem:
On the nahant-list several of us are tracking what appears to be a timing issue
with dm-multipath and EMC Clariions. The basic problem is when you have a mount
point in the fstab:
/dev/dm-1 /disk/san ext3 defaults 1 3
The mount point works as expected, but the system will not boot correctly.
During the boot process fsck will complain that it could not read the superblock
and will also give a "no such file or directory" error for /dev/dm-1. Then you
are prompted for the root password.
Typing in the root password we are always able to fsck/mount/whatever the device
and it appears to be working normally.
There have been several techniques presented for correcting this problem from
inserting sleep commands into rc.sysinit, using LVM (creates an additional
delay), and creating an initrd that inserts the needed modules into the kernel
earlier in the boot process.
If there's a timing issue here, then that's something that needs a better fix.
Key points in the threads:
Latest thread with most of our current discussion:
Version-Release number of selected component (if applicable):
kernel = 2.6.9-42.0.3.ELsmp (i386)
I am experiencing this issue also. I did test and both the "sleep" work around
and the LVM workaround does work around this problem, but, they are still work
arounds since LVM isn't always an option and having the customer add a sleep
line in rc.sysinit isn't a good solution.
We, as customers, are looking to you, as our provider, for a solution to what
appears to be a timing issue.
This is actually a udev timing issue that can effect any multipathed devices on
boot, and there is a simple way to avoid this.
The /dev/mpath directory exists because people wanted an easy way to see all
their multipathed devices, and only their multipathed devices. It is full of
symlinks to the actual /dev/dm-* devices. These symlinks are created by udev.
Unfortunately, sometimes udev doesn't create the symlink fast enough. The
/dev/mapper directory contains actual device nodes for all device mapper
devices. These device nodes are created by device-mapper itself, when it creates
the actual devices, so as soon as the device exists, they will exist.
If you use /dev/mapper/<multipath_device_name> in /etc/fstab, instead of
/dev/mpath/<multipath_device_name> you should not run into this issue. So I
think this just needs some documentation fixes. It would really helpful if
people could verify that changing the fstab line fixes their problems, though.
Created attachment 144762 [details]
This attachment is the /etc/fstab I am currently using. It attempts to mount
/dev/dm-1 not a device from /dev/mpath. Does /dev/dm-1 have the same
My team mate has changed the mount options a bit so its not fsck'd on boot. So
the fstab line from the original comment may be the most accurate one.
Hmmm...apperently adding a an attachment doesn't kick the bug out of NEEDINFO
I'll attempt to try using /dev/mapper/mpath0p1 as the device to confirm tomorrow.
Yes, udev also creates the /dev/dm-* devices... I think. It's possible that
device-mapper creates them too, I don't remember offhand. At any rate, try the
/dev/mapper way, and see if that fixes the problem.
There is a completely unrelated problem with using /dev/dm-* in your /etc/fstab
however. Device mapper makes no promises that a device that is named
/dev/dm-<foo> on one boot will be name /dev/dm-<foo> on the next boot.
Multipath does guarantee that the device named /dev/mapper/<foo> and
/dev/mpath/<foo> will always have that name on that machine.
Even more unrelated information: If you have multiple machines accessing the
same multipathed device, and you are using the user_friendly_names
multipath.conf option (it gives you the mpath<n> names instead of the really
ugly WWID names), there is no guarantee that all the different machines will use
the same name to refer to the same device. In order to get them to do so, run
multipath on one machine to create and name all the multipath devices and then
copy the /var/lib/multipath/bindings file from it to all the other machines.
This will cause all the machines to have the same WWID = user_friendly_name
Indeed, using the following in my fstab does work. The machine boots properly
without a timing or fsck error.
/dev/mapper/mpath0p1 /disk/san ext3 defaults 1 2
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.