220643 – timing issue with EMC Clariion

Bug 220643 - timing issue with EMC Clariion

Summary: timing issue with EMC Clariion

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	device-mapper-multipath
Sub Component:
Version:	4.4
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Ben Marzinski
QA Contact:	Corey Marthaler
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-12-22 18:00 UTC by Jack Neely
Modified:	2018-11-28 20:29 UTC (History)
CC List:	14 users (show)
Fixed In Version:	RHEA-2007-0256
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-05-01 17:48:14 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
fstab (1.31 KB, text/plain) 2007-01-04 02:47 UTC, Jack Neely	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2007:0256	0	normal	SHIPPED_LIVE	device-mapper-multipath enhancement update	2007-05-01 17:35:42 UTC

Description Jack Neely 2006-12-22 18:00:09 UTC

Description of problem:

On the nahant-list several of us are tracking what appears to be a timing issue
with dm-multipath and EMC Clariions.  The basic problem is when you have a mount
point in the fstab:

/dev/dm-1           /disk/san           ext3    defaults 1 3 

The mount point works as expected, but the system will not boot correctly. 
During the boot process fsck will complain that it could not read the superblock
and will also give a "no such file or directory" error for /dev/dm-1.  Then you
are prompted for the root password.

Typing in the root password we are always able to fsck/mount/whatever the device
and it appears to be working normally.

There have been several techniques presented for correcting this problem from
inserting sleep commands into rc.sysinit, using LVM (creates an additional
delay), and creating an initrd that inserts the needed modules into the kernel
earlier in the boot process.

If there's a timing issue here, then that's something that needs a better fix. 
Key points in the threads:

http://www.redhat.com/archives/nahant-list/2006-August/msg00319.html

Me specifically:
http://www.redhat.com/archives/nahant-list/2006-December/msg00192.html

Latest thread with most of our current discussion:
http://www.redhat.com/archives/nahant-list/2006-December/msg00194.html

Version-Release number of selected component (if applicable):
kernel = 2.6.9-42.0.3.ELsmp (i386)
device-mapper-multipath-0.4.5-16.1.RHEL4

Comment 1 Benji Spencer 2007-01-02 18:56:14 UTC

I am experiencing this issue also. I did test and both the "sleep" work around
and the LVM workaround does work around this problem, but, they are still work
arounds since LVM isn't always an option and having the customer add a sleep
line in rc.sysinit isn't a good solution.

We, as customers, are looking to you, as our provider, for a solution to what
appears to be a timing issue.

Comment 3 Ben Marzinski 2007-01-03 23:34:58 UTC

This is actually a udev timing issue that can effect any multipathed devices on
boot, and there is a simple way to avoid this.

The /dev/mpath directory exists because people wanted an easy way to see all
their multipathed devices, and only their multipathed devices. It is full of
symlinks to the actual /dev/dm-* devices. These symlinks are created by udev.
Unfortunately, sometimes udev doesn't create the symlink fast enough. The
/dev/mapper directory contains actual device nodes for all device mapper
devices. These device nodes are created by device-mapper itself, when it creates
the actual devices, so as soon as the device exists, they will exist.

If you use /dev/mapper/<multipath_device_name> in /etc/fstab, instead of
/dev/mpath/<multipath_device_name> you should not run into this issue. So I
think this just needs some documentation fixes. It would really helpful if
people could verify that changing the fstab line fixes their problems, though.

Comment 4 Jack Neely 2007-01-04 02:47:38 UTC

Created attachment 144762 [details]
fstab

This attachment is the /etc/fstab I am currently using.  It attempts to mount
/dev/dm-1 not a device from /dev/mpath.  Does /dev/dm-1 have the same
characteristics?

My team mate has changed the mount options a bit so its not fsck'd on boot.  So
the fstab line from the original comment may be the most accurate one.

Comment 5 Jack Neely 2007-01-04 02:52:36 UTC

Hmmm...apperently adding a an attachment doesn't kick the bug out of NEEDINFO
state.  *kick*

I'll attempt to try using /dev/mapper/mpath0p1 as the device to confirm tomorrow.

Comment 6 Ben Marzinski 2007-01-04 04:03:37 UTC

Yes, udev also creates the /dev/dm-* devices... I think.  It's possible that
device-mapper creates them too, I don't remember offhand. At any rate, try the
/dev/mapper way, and see if that fixes the problem.

There is a completely unrelated problem with using /dev/dm-* in your /etc/fstab
however. Device mapper makes no promises that a device that is named
/dev/dm-<foo> on one boot will be name /dev/dm-<foo> on the next boot. 
Multipath does guarantee that the device named /dev/mapper/<foo> and
/dev/mpath/<foo> will always have that name on that machine.

Even more unrelated information: If you have multiple machines accessing the
same multipathed device, and you are using the user_friendly_names
multipath.conf option (it gives you the mpath<n> names instead of the really
ugly WWID names), there is no guarantee that all the different machines will use
the same name to refer to the same device.  In order to get them to do so, run
multipath on one machine to create and name all the multipath devices and then
copy the /var/lib/multipath/bindings file from it to all the other machines.
This will cause all the machines to have the same WWID = user_friendly_name
bindings.

Comment 7 Jack Neely 2007-01-04 18:00:16 UTC

Indeed, using the following in my fstab does work.  The machine boots properly
without a timing or fsck error.

   /dev/mapper/mpath0p1    /disk/san               ext3    defaults        1 2

Comment 13 Red Hat Bugzilla 2007-05-01 17:48:14 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2007-0256.html

Note You need to log in before you can comment on or make changes to this bug.