Hide Forgot
This bugzilla is for support case 00472442 I am entering this bugzilla to see if I can get some others to look at this before the OS GA's. Details of the problem: I fresh installed RHEL 6 SP 1 on a host with a single internal hard disk, without multipath enabled. After this, I set up multipath to start on the host. Before the activation of multipath, the output of "vgs --options vg_name,pv_name" shows that the root lun is on disk /dev/sda2. After activating multipath, the internal disk gets virtualized into /dev/dm-0, and the output of the vgs command shows that the root lun volume is comprised of the multipath device. If I explicitly blacklist the internal disk, a message will show that the device is blacklisted, but the multipath device will be created anyway. If I chkconfig multipathd off and reboot, multipath will activate and virtualize the internal disk, anyway. Configuration Details: This has been observed on at least five different servers with different branded internal hard disks. Any attempt to prevent the internal disk from being virtualized has failed. Steps to Reproduce: 1. Fresh install RHEL6.1 RC2 2. Set up multipath.conf, activate multipath, reboot Expected Results: After activating multipath, it should virtualize the external storage disks with vendor/product id:: LSI INF-01-00, and not the internal disk. It should ignore the internal disk based on the blacklist rules. Actual Results: A multipath device is created from the internal disk. If user-friendly names are used, the blacklist will prevent a user-friendly name from being assigned to the internal disk, but there will still be a mapping for the internal disk. Logs: vgs --options vg_name,pv_name BEFORE activating multipath PV VG /dev/sda2 vg_dhcp135157577 AFTER activating multipath VG PV vg_dhcp135157577 /dev/mapper/3500000e01453a9c0p2 Output of multipath showing the device being blacklisted: May 10 14:02:27 | sda: (FUJITSU:MAY2036RC) vendor/product blacklisted The multipath device: 3500000e01453a9c0 dm-0 FUJITSU,MAY2036RC size=34G features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=1 status=active `- 1:0:0:0 sda 8:0 active ready running This multipath device will show up no matter what settings I use in the blacklist for /etc/lvm.conf and /etc/multipath.conf. I have never experienced this problem in RHEL6.0, so this implies there was some pretty big change in multipath between 6.0 and 6.1.
Can you post the full output of the multipath command that produced the following line: May 10 14:02:27 | sda: (FUJITSU:MAY2036RC) vendor/product blacklisted
Created attachment 499418 [details] multipath.conf from partner system
Created attachment 499421 [details] multipath -v4 -ll from partner system
Never mind, found the files I needed in the ticket. Was the initramfs re-made after making changes to multipath.conf? Also, the data in the sosreports does not seem to match that in comment #0 - the comment in the bug has dm-0 as the internal disk: 3500000e01453a9c0 dm-0 FUJITSU,MAY2036RC size=34G features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=1 status=active `- 1:0:0:0 sda 8:0 active ready running Which would stand to reason if this was being created by a stale initramfs configuration. But looking at the sosreport data it's moved up to dm-1: 3500000e014719ab0 dm-1 FUJITSU,MAY2036RC size=34G features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=1 status=active `- 4:0:0:0 sdaw 67:0 active ready running There's also a bunch of syntax errors in multipath.conf: May 17 09:32:33 kswc-achilles multipathd: multipath.conf line 4, invalid keyword: selector May 17 09:32:33 kswc-achilles multipathd: multipath.conf line 34, invalid keyword: polling_interval May 17 09:32:33 kswc-achilles multipathd: multipath.conf line 52, invalid keyword: polling_interval May 17 09:32:33 kswc-achilles multipathd: multipath.conf line 70, invalid keyword: polling_interval May 17 09:32:33 kswc-achilles multipathd: multipath.conf line 88, invalid keyword: polling_interval May 17 09:32:33 kswc-achilles multipathd: multipath.conf line 106, invalid keyword: polling_interval The errors for polling_interval are caused by specifying this keyword in a device block - this is a global value for multipathd and cannot be specified on a per-device basis. The selector one is because this keyword is now "path_selector" (it's a bit confusing as there are still some examples floating around that use "selector" and I think the deprecated keyword "default_selector" is still supported but "selector" alone will trigger an error). Below these is a rename: May 17 09:32:34 kswc-achilles multipathd: 3500000e014719ab0: rename 3500000e014719ab0 to mpathy This also makes me think there might be a stale initramfs creating the rogue device here. Could you attach the initramfs image for the running kernel either here or to the support ticket? (or re-run dracut and see if the problem goes away).
The device changed from dm-0 to dm-1 after changing some of the parameters in multipath.conf. Sorry for the confusion, there. I ran dracut -f and then rebooted, and it caused my system to give this message every time: "Kernel panic - not syncing: Attempted to kill init!" A couple of other RH6.1 RC2 hosts in our lab are hitting the same problem.. I reinstalled with RC3 to see if either issue could be hit. I was able to work around the problem in the bug description by blacklisting all devices, except the external storage devices, running dracut -f and rebooting. So far, I have not yet hit the kernel panic again.
Please can you include the full output of commands that you have run, complete boot logs leading up to the panic or the initramfs images themselves as it is not possible to debug a boot failure like this from the limited information in comment #6. If you'd like to upload the images I'd be happy to take a look - the support ticket may be a better location however as the ticketing system has a larger attachment size limit (dracut images generated without -H can be quite large as they include modules for "generic" configurations).
I will have to get back to you on that, as my system is gone, but one of my coworkers did save the initramfs for one of the other systems that experienced the same problem. On it, it gave a message about not being able to load modules.dep, so it looks like during boot it was unable to figure out what drivers to load. Unrwapping the initial ramdisk showed that the modules.dep file was in fact missing from it. In that case, I know an LSI SAS driver was compiled, depmod -a was run, and the initramfs was recreated, and then device mapper multipath was activated and dracut -f was run. In my configuration, it was up and running a cluster with 24 volumes. All I did was modify multipath.conf and issue dracut -f, then reboot. The third case we have hit this was a little more complicated. I will see how much info we can gather and submit it first thing tomorrow. Wouldn't it be more appropriate to submit it as a separate bugzilla? This one can probably be closed, if what I originally described is the expected behavior of multipath.
Thanks - that would be helpful. If module.dep was missing it really sounds like depmod was not run (but this is sounds like a 3rd party module build so it's hard to say) but it's impossible to know for sure without dracut output or the resulting image. I think it would be better to keep it in the existing support ticket for now and we can file new bugs as appropriate.
Created attachment 499682 [details] initramfs that doesn't allow the system to boot Here's an initramfs from a system that hit the panic after issuing dracut -f. It's less than half the size of the working initramfs.
So, have you been able to create a proper initramfs after enabling multipath? If so, does that solve your problem?
Since RHEL 6.2 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux.
Yes, recreating the initramfs seems to solve the problem. I'd say we can go ahead and close this, and if I find another, more specific problem, I'll open a new bug.