Description of problem: On the nahant-list several of us are tracking what appears to be a timing issue with dm-multipath and EMC Clariions. The basic problem is when you have a mount point in the fstab: /dev/dm-1 /disk/san ext3 defaults 1 3 The mount point works as expected, but the system will not boot correctly. During the boot process fsck will complain that it could not read the superblock and will also give a "no such file or directory" error for /dev/dm-1. Then you are prompted for the root password. Typing in the root password we are always able to fsck/mount/whatever the device and it appears to be working normally. There have been several techniques presented for correcting this problem from inserting sleep commands into rc.sysinit, using LVM (creates an additional delay), and creating an initrd that inserts the needed modules into the kernel earlier in the boot process. If there's a timing issue here, then that's something that needs a better fix. Key points in the threads: http://www.redhat.com/archives/nahant-list/2006-August/msg00319.html Me specifically: http://www.redhat.com/archives/nahant-list/2006-December/msg00192.html Latest thread with most of our current discussion: http://www.redhat.com/archives/nahant-list/2006-December/msg00194.html Version-Release number of selected component (if applicable): kernel = 2.6.9-42.0.3.ELsmp (i386) device-mapper-multipath-0.4.5-16.1.RHEL4
I am experiencing this issue also. I did test and both the "sleep" work around and the LVM workaround does work around this problem, but, they are still work arounds since LVM isn't always an option and having the customer add a sleep line in rc.sysinit isn't a good solution. We, as customers, are looking to you, as our provider, for a solution to what appears to be a timing issue.
This is actually a udev timing issue that can effect any multipathed devices on boot, and there is a simple way to avoid this. The /dev/mpath directory exists because people wanted an easy way to see all their multipathed devices, and only their multipathed devices. It is full of symlinks to the actual /dev/dm-* devices. These symlinks are created by udev. Unfortunately, sometimes udev doesn't create the symlink fast enough. The /dev/mapper directory contains actual device nodes for all device mapper devices. These device nodes are created by device-mapper itself, when it creates the actual devices, so as soon as the device exists, they will exist. If you use /dev/mapper/<multipath_device_name> in /etc/fstab, instead of /dev/mpath/<multipath_device_name> you should not run into this issue. So I think this just needs some documentation fixes. It would really helpful if people could verify that changing the fstab line fixes their problems, though.
Created attachment 144762 [details] fstab This attachment is the /etc/fstab I am currently using. It attempts to mount /dev/dm-1 not a device from /dev/mpath. Does /dev/dm-1 have the same characteristics? My team mate has changed the mount options a bit so its not fsck'd on boot. So the fstab line from the original comment may be the most accurate one.
Hmmm...apperently adding a an attachment doesn't kick the bug out of NEEDINFO state. *kick* I'll attempt to try using /dev/mapper/mpath0p1 as the device to confirm tomorrow.
Yes, udev also creates the /dev/dm-* devices... I think. It's possible that device-mapper creates them too, I don't remember offhand. At any rate, try the /dev/mapper way, and see if that fixes the problem. There is a completely unrelated problem with using /dev/dm-* in your /etc/fstab however. Device mapper makes no promises that a device that is named /dev/dm-<foo> on one boot will be name /dev/dm-<foo> on the next boot. Multipath does guarantee that the device named /dev/mapper/<foo> and /dev/mpath/<foo> will always have that name on that machine. Even more unrelated information: If you have multiple machines accessing the same multipathed device, and you are using the user_friendly_names multipath.conf option (it gives you the mpath<n> names instead of the really ugly WWID names), there is no guarantee that all the different machines will use the same name to refer to the same device. In order to get them to do so, run multipath on one machine to create and name all the multipath devices and then copy the /var/lib/multipath/bindings file from it to all the other machines. This will cause all the machines to have the same WWID = user_friendly_name bindings.
Indeed, using the following in my fstab does work. The machine boots properly without a timing or fsck error. /dev/mapper/mpath0p1 /disk/san ext3 defaults 1 2
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2007-0256.html