Red Hat Bugzilla – Bug 195330
lvm configuration gets into inconsistant state causing buffer i/o errors
Last modified: 2010-01-11 21:26:17 EST
Description of problem: With lvm configured on top of device mapper
multipathed storage, if the devices shown in the configuration header on the
disks (/dev/dm) do not match the UUID shown, then buffer i/o errors occur.
Correcting the device to match the UUID solves the problem.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Use lvm to create a volume group on two dm devices and a stripped logical
volume across the two devices in the volume group 2. Use dd to look at the lvm
header on the dm device and verify that the two devices used in the vgcreate
command are shown in the configuration 3. Create GFS file system on the
logical volume 4. Run traffic to the GFS file system 5. Fail an array
controller to cause all paths to the logical volume to fail which causing the
GFS file system to withdraw 6. Reboot the node 7. Use dd to look at the lvm
headers on the dm devices and observe that the dm device shown in the
configuration do not match the dm devices used to create the volume group 8.
Run traffic to the GFS file system and observe buffer i/o errors
Buffer i/o errors
No i/o errors
There are several things happening with this bug. Here is the scenario that
lead us to the point of recognizing that the LVM metadata inconsistancy was
causing the buffer i/o errors. After creating completely new pv's, vg's, and
lv's on a GFS clustered system with three nodes, we ran traffic for two days
with no problems. We then failed one of the controllers on the storage
array. Two of the nodes reported scsi i/o errors on the active path group
only as expected, but the third node got scsi i/o errors on all 8 paths (4
paths on both active and passive path groups). When all the paths to a LUN on
the third node failed, the GFS file system withdrew on that node. This was
not the expected behavior since only the active controller and not the standby
controller was reset. The other two nodes recovered from the controller reset
and ran without error until we rebooted the third node. When the third node
came up and remounted the GFS file system, then all three nodes started
reporting buffer i/o errors. When we looked at the LVM metadata, we found the
dm device inconsistancy. After using a disk editor to correct the
inconsistency, we were able to again run for two days with no errors. I will
attach the system logs from these three nodes.
Created attachment 130918 [details]
This attachment shows the output of a pvs command indicating that the volume
group vg_igrid_01 is on /dev/dm-2 and /dev/dm-3. It also shows the output of a
dd command reading from /dev/dm-2 which has the lvm configuration data. This
shows that the volume group is on /dev/dm-0 and /dev/dm-1. Note however that
the UUID shown in the configuration for /dev/dm-1 is actually the same as the
UUID shown at the beginning of the dd output for /dev/dm-2.
Created attachment 130919 [details]
Created attachment 130920 [details]
Created attachment 130921 [details]
This is one of the two nodes that recovered from the controller reset which
occured about 9:00am on June 9.
Created attachment 130922 [details]
This is the third node which had the GFS file system withdrawal at about 9:00am
on June 9 and was rebooted at about 9:30am.
This sounds like it might be a multipath related problem. Does that sounds
Yes, it does. It looks like multipath does not always create the same dm
device for a given LUN. I noticed on the multipath tools website that there
is a something called devmap_name that udev can use to name the dm devices.
However, this has been commented out in /etc/udev/rules.d/50-udev.rules. Here
is the quote from http://christophe.varoqui.free.fr/wiki/wakka.php?
"The udev userspace tool is triggered upon every block sysfs entry creation
and suppression, and assume the responsibility of the associated device node
creation and naming. Udev default naming policies can be complemented by add-
on scripts or binaries. As it does not currently have a default policy for
device maps naming, we plug a little tool named devmap_name that resolve the
sysfs dm-[0-9]* names in map names as set at map creation time. Provided the
map naming is rightly done, this plugin provides the naming stability and
meaningfulness required for a proper multipath implementation."
Any idea why that was commented out in this release?
I have no idea. This needs assigning to someone who knows about multipath!
You should never use /dev/dm-* to reference the devices. Those are not
permanent names. Since you have user friendly names turned on, you should be
able to use the user friendly multipath names (/dev/mpath/mpath*) These are
unique per machine (i.e. on nodeA, if a multipathed device is assigned mpath0,
it will always be assigned mpath0). To have them be unique across the cluster,
start up multipathing on one machine. Then copy the /var/lib/multipath/bindings
file from that machine to all the other machines in the cluster. Then all the
machines will use the same user friendly names for devices. Otherwise, you can
turn the user friendly names feature off, and just refer to the devices by their
multipath assigned WWID.
I have been unable to recreate this. Are you able to reproduce either the IO
errors, or the the all paths failure?
We have made changes in our system to prevent the LVM metadata inconsistency
from occurring. We are not seeing buffer I/O errors or path failures now.
Feel free to close this bug if you like.