Red Hat Bugzilla – Bug 309891
Improve support for RDAC storage to dm-multipath - MD3000 failback fails
Last modified: 2010-01-11 21:39:17 EST
Since code went into 5.1. from bz #248931, this clone should be used for the
Uploading the console output would be useful.
Do you know what programs are trying to access the private paths. If it's only
LVM, you should try filtering out the devices in /etc/lvm/lvm.conf. This will
keep lvm from trying to scan them, which is what you want, since they are owned
by the multipath device anyway.
If programs are accessing the passive path directly (for example by reading from
/dev/sdX, where /dev/sdX is a passive path) there is no way that
device-mapper-multipath can stop them. LVM does this, but you can add filters to
avoid it. If there are other programs that do it, they hopefully have a method
for filtering devices as well.
If the accesses are coming from multipath, it's either a configuration problem,
or a bug in the code.
I will do that and upload the output tomorrow.
As far as LVM, this happens even with unconfigured LUNs before creating any
volumes on them. However, I see what you are getting at, since LVM will scan
all block devices looking for LVMs. I tried this:
filter = [ "a/sda/", "r/.*/" ]
But the behavior is the same.
Hmm. If it's not LVM, do have any idea what programs ARE sending IO to the
passive paths? To check if it's multipathd (which would point to a bad checker
function.. either misconfigured or buggy) you can run
# multipathd -k
and then repeating run (you can scroll through the command history with the
> show paths
# tail -f /var/log/messages
This way, you should be able to see if the error messages from the paths
coincide with the path checker running on them. Use ctrl-d to exit the
interactive multipathd shell.
Also, instead of removing the device-mapper-multipath module, you can just
blacklist all your devices
This will let you know if you can see these errors even without any multipathed
Also, just to double-check that LVM scanning isn't causing these errors, you
# lvscan -vv
And check to see if it causes any errors, and also check to make sure that the
passive paths aren't listed in the list of paths that it checks.
It looks like the LVM filter did in fact solve part of this problem, but not all
of it. When I tested the filter above with 20 LUNs earlier, I timed the boot
like usual, and when it started spewing the same errors and was into 10 minutes
of boot time I wrote it off as the same behavior.
However, on further inspection, it looks like the filter actually did solve a
lot of issues. Previously, lvscan, pvs, pvcreate, and other commands would
generate these errors, but now they don't. Boot time with 20 LUNs was 13
minutes instead of 17.
So it seems there are still two other spots that the boot process hangs: 1. udev
and 2. haldaemon.
Udev gives errors during "Staring udev" which causes boot time relative to the
number of attached LUN.
But I think the worst offender is haldaemon. I turned off this service, and
rebooted my system. Boot time with 20 LUNs was just under 6 minutes.
According to an article I read online, "HAL is used for discovering storage,
networking <snip>". So this seems to be causing the most amount of boot time.
After I get a login screen, if I run "service haldaemon restart" it will print
tons of errors and be unavailable until complete.
Do I need to open a bug under haldaemon? What happens on other types of shared
storage such as EMC or iSCSI and 20 LUNs?
fdisk -l generates the same errors as before, but that is because it is trying
to list the partition table on each device, including the passive ones.
I turned on multipathd and modified the filter to:
filter = [ "a/sda/", "a|mapper|", "r/.*/" ]
And access to the dm device worked fine.
I will attach a console log of the system booting with 20 LUNs. I added a
$(date) statement at the top and bottom of rc.sysinit, which gives a good idea
of elapsed time. Also, there are so many paths that the devices were not
filtered out that started at /dev/sda, I need to find a better way to not filter
out the local disk.
Also, everything above was done with all devices blacklisted in multipath.conf
and multipathd off
Created attachment 233141 [details]
serial console output of a Dell PE2950 booting with 20 LUNs attached to an MD3000 with redundant paths
*** Bug 307151 has been marked as a duplicate of this bug. ***
To filter the your sda[a-z] devices, you should just be able to use a filter
filter = [ "a/sda$/", "a|mapper|", "r/.*/" ]
I couldn't find any straightforward way to filter the haldaemon, but that
doesn't mean that there isn't one. You should probably open a bugzilla against
hal. Either that or you can just change the component of this bug to hal, if you
don't have any more multipath specific issues.
Are there any more multipath issues related to this bug? Otherwise I will close
Dell decided to go with LSI's MPP driver, so I have not been able to test this
out lately. I will open a bug with haldaemon about this. This bug can be closed.