Since code went into 5.1. from bz #248931, this clone should be used for the continuing issues.
Uploading the console output would be useful. Do you know what programs are trying to access the private paths. If it's only LVM, you should try filtering out the devices in /etc/lvm/lvm.conf. This will keep lvm from trying to scan them, which is what you want, since they are owned by the multipath device anyway. If programs are accessing the passive path directly (for example by reading from /dev/sdX, where /dev/sdX is a passive path) there is no way that device-mapper-multipath can stop them. LVM does this, but you can add filters to avoid it. If there are other programs that do it, they hopefully have a method for filtering devices as well. If the accesses are coming from multipath, it's either a configuration problem, or a bug in the code.
I will do that and upload the output tomorrow. As far as LVM, this happens even with unconfigured LUNs before creating any volumes on them. However, I see what you are getting at, since LVM will scan all block devices looking for LVMs. I tried this: filter = [ "a/sda/", "r/.*/" ] But the behavior is the same.
Hmm. If it's not LVM, do have any idea what programs ARE sending IO to the passive paths? To check if it's multipathd (which would point to a bad checker function.. either misconfigured or buggy) you can run # multipathd -k and then repeating run (you can scroll through the command history with the arrow keys) > show paths while watching # tail -f /var/log/messages This way, you should be able to see if the error messages from the paths coincide with the path checker running on them. Use ctrl-d to exit the interactive multipathd shell. Also, instead of removing the device-mapper-multipath module, you can just blacklist all your devices blacklist { devnode "*" } This will let you know if you can see these errors even without any multipathed devices running. Also, just to double-check that LVM scanning isn't causing these errors, you can run # lvscan -vv And check to see if it causes any errors, and also check to make sure that the passive paths aren't listed in the list of paths that it checks.
It looks like the LVM filter did in fact solve part of this problem, but not all of it. When I tested the filter above with 20 LUNs earlier, I timed the boot like usual, and when it started spewing the same errors and was into 10 minutes of boot time I wrote it off as the same behavior. However, on further inspection, it looks like the filter actually did solve a lot of issues. Previously, lvscan, pvs, pvcreate, and other commands would generate these errors, but now they don't. Boot time with 20 LUNs was 13 minutes instead of 17. So it seems there are still two other spots that the boot process hangs: 1. udev and 2. haldaemon. Udev gives errors during "Staring udev" which causes boot time relative to the number of attached LUN. But I think the worst offender is haldaemon. I turned off this service, and rebooted my system. Boot time with 20 LUNs was just under 6 minutes. According to an article I read online, "HAL is used for discovering storage, networking <snip>". So this seems to be causing the most amount of boot time. After I get a login screen, if I run "service haldaemon restart" it will print tons of errors and be unavailable until complete. Do I need to open a bug under haldaemon? What happens on other types of shared storage such as EMC or iSCSI and 20 LUNs? fdisk -l generates the same errors as before, but that is because it is trying to list the partition table on each device, including the passive ones. I turned on multipathd and modified the filter to: filter = [ "a/sda/", "a|mapper|", "r/.*/" ] And access to the dm device worked fine. I will attach a console log of the system booting with 20 LUNs. I added a $(date) statement at the top and bottom of rc.sysinit, which gives a good idea of elapsed time. Also, there are so many paths that the devices were not filtered out that started at /dev/sda, I need to find a better way to not filter out the local disk.
Also, everything above was done with all devices blacklisted in multipath.conf and multipathd off
Created attachment 233141 [details] serial console output of a Dell PE2950 booting with 20 LUNs attached to an MD3000 with redundant paths
*** Bug 307151 has been marked as a duplicate of this bug. ***
To filter the your sda[a-z] devices, you should just be able to use a filter line like: filter = [ "a/sda$/", "a|mapper|", "r/.*/" ] I couldn't find any straightforward way to filter the haldaemon, but that doesn't mean that there isn't one. You should probably open a bugzilla against hal. Either that or you can just change the component of this bug to hal, if you don't have any more multipath specific issues.
Are there any more multipath issues related to this bug? Otherwise I will close it.
Dell decided to go with LSI's MPP driver, so I have not been able to test this out lately. I will open a bug with haldaemon about this. This bug can be closed. Thank you.