Bug 195330 - lvm configuration gets into inconsistant state causing buffer i/o errors
Summary: lvm configuration gets into inconsistant state causing buffer i/o errors
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: device-mapper-multipath
Version: 4.0
Hardware: All
OS: Linux
Target Milestone: ---
: ---
Assignee: Alasdair Kergon
QA Contact: Cluster QE
Depends On:
TreeView+ depends on / blocked
Reported: 2006-06-14 20:09 UTC by Henry Harris
Modified: 2010-01-12 02:26 UTC (History)
3 users (show)

Clone Of:
Last Closed: 2006-08-02 21:50:31 UTC

Attachments (Terms of Use)
Additional info (8.19 KB, application/octet-stream)
2006-06-14 20:21 UTC, Henry Harris
no flags Details
Multipath configuration (1.85 KB, text/plain)
2006-06-14 20:23 UTC, Henry Harris
no flags Details
LVM configuration (10.26 KB, text/plain)
2006-06-14 20:24 UTC, Henry Harris
no flags Details
System log (1.50 MB, text/plain)
2006-06-14 20:26 UTC, Henry Harris
no flags Details
System log (1.55 MB, text/plain)
2006-06-14 20:27 UTC, Henry Harris
no flags Details

Description Henry Harris 2006-06-14 20:09:20 UTC
Description of problem: With lvm configured on top of device mapper 
multipathed storage, if the devices shown in the configuration header on the 
disks (/dev/dm) do not match the UUID shown, then buffer i/o errors occur.  
Correcting the device to match the UUID solves the problem.

Version-Release number of selected component (if applicable):

How reproducible:
Every time

Steps to Reproduce:
1. Use lvm to create a volume group on two dm devices and a stripped logical 
volume across the two devices in the volume group 2. Use dd to look at the lvm 
header on the dm device and verify that the two devices used in the vgcreate 
command are shown in the configuration 3. Create GFS file system on the 
logical volume 4. Run traffic to the GFS file system 5. Fail an array 
controller to cause all paths to the logical volume to fail which causing the 
GFS file system to withdraw 6. Reboot the node 7. Use dd to look at the lvm 
headers on the dm devices and observe that the dm device shown in the 
configuration do not match the dm devices used to create the volume group 8. 
Run traffic to the GFS file system and observe buffer i/o errors
Actual results:
Buffer i/o errors

Expected results:
No i/o errors

Additional info: 
There are several things happening with this bug.  Here is the scenario that 
lead us to the point of recognizing that the LVM metadata inconsistancy was 
causing the buffer i/o errors.  After creating completely new pv's, vg's, and 
lv's on a GFS clustered system with three nodes, we ran traffic for two days 
with no problems.  We then failed one of the controllers on the storage 
array.  Two of the nodes reported scsi i/o errors on the active path group 
only as expected, but the third node got scsi i/o errors on all 8 paths (4 
paths on both active and passive path groups).  When all the paths to a LUN on 
the third node failed, the GFS file system withdrew on that node.  This was 
not the expected behavior since only the active controller and not the standby 
controller was reset.  The other two nodes recovered from the controller reset 
and ran without error until we rebooted the third node.  When the third node 
came up and remounted the GFS file system, then all three nodes started 
reporting buffer i/o errors.  When we looked at the LVM metadata, we found the 
dm device inconsistancy.  After using a disk editor to correct the 
inconsistency, we were able to again run for two days with no errors.  I will 
attach the system logs from these three nodes.

Comment 1 Henry Harris 2006-06-14 20:21:52 UTC
Created attachment 130918 [details]
Additional info

This attachment shows the output of a pvs command indicating that the volume
group vg_igrid_01 is on /dev/dm-2 and /dev/dm-3.  It also shows the output of a
dd command reading from /dev/dm-2 which has the lvm configuration data.  This
shows that the volume group is on /dev/dm-0 and /dev/dm-1.  Note however that
the UUID shown in the configuration for /dev/dm-1 is actually the same as the
UUID shown at the beginning of the dd output for /dev/dm-2.

Comment 2 Henry Harris 2006-06-14 20:23:38 UTC
Created attachment 130919 [details]
Multipath configuration

Comment 3 Henry Harris 2006-06-14 20:24:29 UTC
Created attachment 130920 [details]
LVM configuration

Comment 4 Henry Harris 2006-06-14 20:26:06 UTC
Created attachment 130921 [details]
System log

This is one of the two nodes that recovered from the controller reset which
occured about 9:00am on June 9.

Comment 5 Henry Harris 2006-06-14 20:27:25 UTC
Created attachment 130922 [details]
System log

This is the third node which had the GFS file system withdrawal at about 9:00am
on June 9 and was rebooted at about 9:30am.

Comment 6 Christine Caulfield 2006-06-15 14:23:38 UTC
This sounds like it might be a multipath related problem. Does that sounds
reasonable ?

Comment 7 Henry Harris 2006-06-15 14:51:35 UTC
Yes, it does.  It looks like multipath does not always create the same dm 
device for a given LUN.  I noticed on the multipath tools website that there 
is a something called devmap_name that udev can use to name the dm devices.  
However, this has been commented out in /etc/udev/rules.d/50-udev.rules.  Here 
is the quote from http://christophe.varoqui.free.fr/wiki/wakka.php?

"The udev userspace tool is triggered upon every block sysfs entry creation 
and suppression, and assume the responsibility of the associated device node 
creation and naming. Udev default naming policies can be complemented by add-
on scripts or binaries. As it does not currently have a default policy for 
device maps naming, we plug a little tool named devmap_name that resolve the 
sysfs dm-[0-9]* names in map names as set at map creation time. Provided the 
map naming is rightly done, this plugin provides the naming stability and 
meaningfulness required for a proper multipath implementation."

Any idea why that was commented out in this release?

Comment 8 Christine Caulfield 2006-06-15 15:21:01 UTC
I have no idea. This needs assigning to someone who knows about multipath!

Comment 10 Ben Marzinski 2006-06-19 23:29:25 UTC
You should never use /dev/dm-* to reference the devices.  Those are not
permanent names. Since you have user friendly names turned on, you should be
able to use the user friendly multipath names (/dev/mpath/mpath*)  These are
unique per machine (i.e. on nodeA, if a multipathed device is assigned mpath0,
it will always be assigned mpath0). To have them be unique across the cluster,
start up multipathing on one machine. Then copy the /var/lib/multipath/bindings
file from that machine to all the other machines in the cluster. Then all the
machines will use the same user friendly names for devices. Otherwise, you can
turn the user friendly names feature off, and just refer to the devices by their
multipath assigned WWID.

Comment 11 Ben Marzinski 2006-07-10 21:08:28 UTC
I have been unable to recreate this. Are you able to reproduce either the IO
errors, or the the all paths failure?

Comment 12 Henry Harris 2006-08-02 16:56:59 UTC
We have made changes in our system to prevent the LVM metadata inconsistency 
from occurring. We are not seeing buffer I/O errors or path failures now.  
Feel free to close this bug if you like.  

Note You need to log in before you can comment on or make changes to this bug.