| Summary: | [NetApp CQ179680] [RH Support case 00472442] Device Mapper Multipath creates multipath device for internal hard disk | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Sean Stewart <Sean.Stewart> | ||||||||
| Component: | device-mapper-multipath | Assignee: | Ben Marzinski <bmarzins> | ||||||||
| Status: | CLOSED WORKSFORME | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||||
| Severity: | urgent | Docs Contact: | |||||||||
| Priority: | unspecified | ||||||||||
| Version: | 6.1 | CC: | agk, bmarzins, bmr, coughlan, dwysocha, heinzm, jbrassow, mbroz, msnitzer, prajnoha, prockai, zkabelac | ||||||||
| Target Milestone: | rc | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | x86_64 | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2012-03-19 18:58:50 UTC | Type: | --- | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Attachments: |
|
||||||||||
|
Description
Sean Stewart
2011-05-17 16:23:34 UTC
Can you post the full output of the multipath command that produced the following line: May 10 14:02:27 | sda: (FUJITSU:MAY2036RC) vendor/product blacklisted Created attachment 499418 [details]
multipath.conf from partner system
Created attachment 499421 [details]
multipath -v4 -ll from partner system
Never mind, found the files I needed in the ticket. Was the initramfs re-made after making changes to multipath.conf? Also, the data in the sosreports does not seem to match that in comment #0 - the comment in the bug has dm-0 as the internal disk: 3500000e01453a9c0 dm-0 FUJITSU,MAY2036RC size=34G features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=1 status=active `- 1:0:0:0 sda 8:0 active ready running Which would stand to reason if this was being created by a stale initramfs configuration. But looking at the sosreport data it's moved up to dm-1: 3500000e014719ab0 dm-1 FUJITSU,MAY2036RC size=34G features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=1 status=active `- 4:0:0:0 sdaw 67:0 active ready running There's also a bunch of syntax errors in multipath.conf: May 17 09:32:33 kswc-achilles multipathd: multipath.conf line 4, invalid keyword: selector May 17 09:32:33 kswc-achilles multipathd: multipath.conf line 34, invalid keyword: polling_interval May 17 09:32:33 kswc-achilles multipathd: multipath.conf line 52, invalid keyword: polling_interval May 17 09:32:33 kswc-achilles multipathd: multipath.conf line 70, invalid keyword: polling_interval May 17 09:32:33 kswc-achilles multipathd: multipath.conf line 88, invalid keyword: polling_interval May 17 09:32:33 kswc-achilles multipathd: multipath.conf line 106, invalid keyword: polling_interval The errors for polling_interval are caused by specifying this keyword in a device block - this is a global value for multipathd and cannot be specified on a per-device basis. The selector one is because this keyword is now "path_selector" (it's a bit confusing as there are still some examples floating around that use "selector" and I think the deprecated keyword "default_selector" is still supported but "selector" alone will trigger an error). Below these is a rename: May 17 09:32:34 kswc-achilles multipathd: 3500000e014719ab0: rename 3500000e014719ab0 to mpathy This also makes me think there might be a stale initramfs creating the rogue device here. Could you attach the initramfs image for the running kernel either here or to the support ticket? (or re-run dracut and see if the problem goes away). The device changed from dm-0 to dm-1 after changing some of the parameters in multipath.conf. Sorry for the confusion, there. I ran dracut -f and then rebooted, and it caused my system to give this message every time: "Kernel panic - not syncing: Attempted to kill init!" A couple of other RH6.1 RC2 hosts in our lab are hitting the same problem.. I reinstalled with RC3 to see if either issue could be hit. I was able to work around the problem in the bug description by blacklisting all devices, except the external storage devices, running dracut -f and rebooting. So far, I have not yet hit the kernel panic again. Please can you include the full output of commands that you have run, complete boot logs leading up to the panic or the initramfs images themselves as it is not possible to debug a boot failure like this from the limited information in comment #6. If you'd like to upload the images I'd be happy to take a look - the support ticket may be a better location however as the ticketing system has a larger attachment size limit (dracut images generated without -H can be quite large as they include modules for "generic" configurations). I will have to get back to you on that, as my system is gone, but one of my coworkers did save the initramfs for one of the other systems that experienced the same problem. On it, it gave a message about not being able to load modules.dep, so it looks like during boot it was unable to figure out what drivers to load. Unrwapping the initial ramdisk showed that the modules.dep file was in fact missing from it. In that case, I know an LSI SAS driver was compiled, depmod -a was run, and the initramfs was recreated, and then device mapper multipath was activated and dracut -f was run. In my configuration, it was up and running a cluster with 24 volumes. All I did was modify multipath.conf and issue dracut -f, then reboot. The third case we have hit this was a little more complicated. I will see how much info we can gather and submit it first thing tomorrow. Wouldn't it be more appropriate to submit it as a separate bugzilla? This one can probably be closed, if what I originally described is the expected behavior of multipath. Thanks - that would be helpful. If module.dep was missing it really sounds like depmod was not run (but this is sounds like a 3rd party module build so it's hard to say) but it's impossible to know for sure without dracut output or the resulting image. I think it would be better to keep it in the existing support ticket for now and we can file new bugs as appropriate. Created attachment 499682 [details]
initramfs that doesn't allow the system to boot
Here's an initramfs from a system that hit the panic after issuing dracut -f. It's less than half the size of the working initramfs.
So, have you been able to create a proper initramfs after enabling multipath? If so, does that solve your problem? Since RHEL 6.2 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.
Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.
Yes, recreating the initramfs seems to solve the problem. I'd say we can go ahead and close this, and if I find another, more specific problem, I'll open a new bug. |