Bug 220643

Summary: timing issue with EMC Clariion
Product: Red Hat Enterprise Linux 4 Reporter: Jack Neely <jjneely>
Component: device-mapper-multipathAssignee: Ben Marzinski <bmarzins>
Status: CLOSED ERRATA QA Contact: Corey Marthaler <cmarthal>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.4CC: agk, ben.spencer, bmarzins, christophe.varoqui, dwysocha, egoggin, junichi.nomura, kueda, lmb, mbroz, michael.hagmann, prockai, rkenna, tranlan
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHEA-2007-0256 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-05-01 17:48:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
fstab none

Description Jack Neely 2006-12-22 18:00:09 UTC
Description of problem:

On the nahant-list several of us are tracking what appears to be a timing issue
with dm-multipath and EMC Clariions.  The basic problem is when you have a mount
point in the fstab:

/dev/dm-1           /disk/san           ext3    defaults 1 3 

The mount point works as expected, but the system will not boot correctly. 
During the boot process fsck will complain that it could not read the superblock
and will also give a "no such file or directory" error for /dev/dm-1.  Then you
are prompted for the root password.

Typing in the root password we are always able to fsck/mount/whatever the device
and it appears to be working normally.

There have been several techniques presented for correcting this problem from
inserting sleep commands into rc.sysinit, using LVM (creates an additional
delay), and creating an initrd that inserts the needed modules into the kernel
earlier in the boot process.

If there's a timing issue here, then that's something that needs a better fix. 
Key points in the threads:

http://www.redhat.com/archives/nahant-list/2006-August/msg00319.html

Me specifically:
http://www.redhat.com/archives/nahant-list/2006-December/msg00192.html

Latest thread with most of our current discussion:
http://www.redhat.com/archives/nahant-list/2006-December/msg00194.html

Version-Release number of selected component (if applicable):
kernel = 2.6.9-42.0.3.ELsmp (i386)
device-mapper-multipath-0.4.5-16.1.RHEL4

Comment 1 Benji Spencer 2007-01-02 18:56:14 UTC
I am experiencing this issue also. I did test and both the "sleep" work around
and the LVM workaround does work around this problem, but, they are still work
arounds since LVM isn't always an option and having the customer add a sleep
line in rc.sysinit isn't a good solution.

We, as customers, are looking to you, as our provider, for a solution to what
appears to be a timing issue.

Comment 3 Ben Marzinski 2007-01-03 23:34:58 UTC
This is actually a udev timing issue that can effect any multipathed devices on
boot, and there is a simple way to avoid this.

The /dev/mpath directory exists because people wanted an easy way to see all
their multipathed devices, and only their multipathed devices. It is full of
symlinks to the actual /dev/dm-* devices. These symlinks are created by udev.
Unfortunately, sometimes udev doesn't create the symlink fast enough. The
/dev/mapper directory contains actual device nodes for all device mapper
devices. These device nodes are created by device-mapper itself, when it creates
the actual devices, so as soon as the device exists, they will exist.

If you use /dev/mapper/<multipath_device_name> in /etc/fstab, instead of
/dev/mpath/<multipath_device_name> you should not run into this issue. So I
think this just needs some documentation fixes. It would really helpful if
people could verify that changing the fstab line fixes their problems, though.

Comment 4 Jack Neely 2007-01-04 02:47:38 UTC
Created attachment 144762 [details]
fstab

This attachment is the /etc/fstab I am currently using.  It attempts to mount
/dev/dm-1 not a device from /dev/mpath.  Does /dev/dm-1 have the same
characteristics?

My team mate has changed the mount options a bit so its not fsck'd on boot.  So
the fstab line from the original comment may be the most accurate one.

Comment 5 Jack Neely 2007-01-04 02:52:36 UTC
Hmmm...apperently adding a an attachment doesn't kick the bug out of NEEDINFO
state.  *kick*

I'll attempt to try using /dev/mapper/mpath0p1 as the device to confirm tomorrow.

Comment 6 Ben Marzinski 2007-01-04 04:03:37 UTC
Yes, udev also creates the /dev/dm-* devices... I think.  It's possible that
device-mapper creates them too, I don't remember offhand. At any rate, try the
/dev/mapper way, and see if that fixes the problem.

There is a completely unrelated problem with using /dev/dm-* in your /etc/fstab
however. Device mapper makes no promises that a device that is named
/dev/dm-<foo> on one boot will be name /dev/dm-<foo> on the next boot. 
Multipath does guarantee that the device named /dev/mapper/<foo> and
/dev/mpath/<foo> will always have that name on that machine.

Even more unrelated information: If you have multiple machines accessing the
same multipathed device, and you are using the user_friendly_names
multipath.conf option (it gives you the mpath<n> names instead of the really
ugly WWID names), there is no guarantee that all the different machines will use
the same name to refer to the same device.  In order to get them to do so, run
multipath on one machine to create and name all the multipath devices and then
copy the /var/lib/multipath/bindings file from it to all the other machines.
This will cause all the machines to have the same WWID = user_friendly_name
bindings.

Comment 7 Jack Neely 2007-01-04 18:00:16 UTC
Indeed, using the following in my fstab does work.  The machine boots properly
without a timing or fsck error.

   /dev/mapper/mpath0p1    /disk/san               ext3    defaults        1 2


Comment 13 Red Hat Bugzilla 2007-05-01 17:48:14 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2007-0256.html