Bug 597268
| Summary: | lvm devices are not initialized in kdump kernel | ||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Boris Ranto <branto> | ||||||||||||||||||||||||||
| Component: | kexec-tools | Assignee: | Neil Horman <nhorman> | ||||||||||||||||||||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Boris Ranto <branto> | ||||||||||||||||||||||||||
| Severity: | high | Docs Contact: | |||||||||||||||||||||||||||
| Priority: | low | ||||||||||||||||||||||||||||
| Version: | 6.0 | CC: | antillon.maurizio, phan, qcai, syeghiay | ||||||||||||||||||||||||||
| Target Milestone: | rc | Keywords: | Reopened | ||||||||||||||||||||||||||
| Target Release: | --- | ||||||||||||||||||||||||||||
| Hardware: | All | ||||||||||||||||||||||||||||
| OS: | Linux | ||||||||||||||||||||||||||||
| Whiteboard: | |||||||||||||||||||||||||||||
| Fixed In Version: | kexec-tools-2.0.0-121 | Doc Type: | Bug Fix | ||||||||||||||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||||||||||||||
| Last Closed: | 2010-11-10 21:00:10 UTC | Type: | --- | ||||||||||||||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||||||||
| Embargoed: | |||||||||||||||||||||||||||||
| Attachments: |
|
||||||||||||||||||||||||||||
could you please provide a sosreport of this system when its running normally? Thanks! Created attachment 419356 [details]
sosreport for the system
Ok, this is from latest system build. Hope it helps.
grubmle, this looks like a combination of bad latency in the megaraid scan and a another device getting scanned early that satisfies the critical disk requirement. We can see in the console log where we start the scsi bus scan right after the module is loaded, but the drives on that device don't get detected until after we drop to a shell. Since a scsi cd rom drive from another bus takes the the sda name, we pass the critical disks check. Is this problem corrected if you add ata_piix to the kdump.conf module blacklist? grubmle, this looks like a combination of bad latency in the megaraid scan and a another device getting scanned early that satisfies the critical disk requirement. We can see in the console log where we start the scsi bus scan right after the module is loaded, but the drives on that device don't get detected until after we drop to a shell. Since a scsi cd rom drive from another bus takes the the sda name, we pass the critical disks check. Is this problem corrected if you add ata_piix to the kdump.conf module blacklist? grubmle, this looks like a combination of bad latency in the megaraid scan and a another device getting scanned early that satisfies the critical disk requirement. We can see in the console log where we start the scsi bus scan right after the module is loaded, but the drives on that device don't get detected until after we drop to a shell. Since a scsi cd rom drive from another bus takes the the sda name, we pass the critical disks check. Is this problem corrected if you add ata_piix to the kdump.conf module blacklist? Created attachment 426281 [details]
Console output when blacklisting sd_mod
I finally managed to find out what module the drive uses(it was sd_mod) but the result is not very promising. When I blacklist sd_mod the system gets stuck for a very long time(waited for about half an hour or so and no progress). From log it looks like it waits for device sda.
you don't want to blacklist sd_mod (that module enables all your scsi exported drives), you want to blacklist, as I noted above the ata_piix module. Created attachment 427602 [details]
Blacklisted ata_piix and ata_generic
If I blacklisted just ata_piix, nothing changed. I tried to blacklist ata_generic too but again, no change.
ok, thanks. It just occured to me that this might be a different case of a known problem that we've recently fixed in RHEL6. Can you try the attached package and see if it clears the issue for you please? Created attachment 427693 [details]
test package
Created attachment 427937 [details]
Log with new kexec tools
I've installed the test package and checked with and without blacklisted ata_piix module but no significant improvement was found. The result is still the same.
grr, ok, this is something new then. Lemme see if we can do scsi device mapping here by hand. Until then you can manually update mkdumprd to pause for a minute or so. That will give sdb an opportunity to get detected so that lvm will assemble all your devices. Just add this: emit "sleep 120" after the line in /sbin/mkdumprd that contains the string: Making device-mapper control node I added the sleep but it didn't help. I guess the problem might be that in normal kernel, the device is detected as sda, not sdb. sdb is 'Attached SCSI removable disk': sd 2:2:0:0: [sda] 71024640 512-byte logical blocks: (36.3 GB/33.8 GiB) sd 2:2:0:0: [sda] Write Protect is off sd 2:2:0:0: [sda] Mode Sense: 00 00 00 00 sd 2:2:0:0: [sda] Asking for cache data failed sd 2:2:0:0: [sda] Assuming drive cache: write through sd 2:2:0:0: [sda] Asking for cache data failed sd 2:2:0:0: [sda] Assuming drive cache: write through sda: sda1 sda2 sd 2:2:0:0: [sda] Asking for cache data failed sd 2:2:0:0: [sda] Assuming drive cache: write through sd 3:0:0:0: [sdb] Attached SCSI removable disk sd 2:2:0:0: [sda] Attached SCSI disk According to next line, logical volumes are in sda2: dracut: Scanning devices sda2 for LVM logical volumes vg_dellpe285004/lv_root vg_dellpe285004/lv_swap Another thing I don't like is that even though sdb is finally initialized, /dev/sdb* devices doesn't exist: / # ls /dev/sd* /dev/sda /dev/sda11 /dev/sda14 /dev/sda2 /dev/sda5 /dev/sda8 /dev/sda1 /dev/sda12 /dev/sda15 /dev/sda3 /dev/sda6 /dev/sda9 /dev/sda10 /dev/sda13 /dev/sda16 /dev/sda4 /dev/sda7 / # Thats part of the problem, but lvm should handle that, as long as sdb gets detected eventually prior to the creation of block devices... Which is the problem. Sorry, I told you the wrong place to insert the sleep. Instead of being right before the "Making device mapper control node" line it should be right after the "Creating Block Devices" line. That will allow the driver to detect sdb and register it in sysfs, which in turn will allow the init script to build the device node in /dev Ok, with the sleep on the other place, kdump works well. ok, good, that can be your workaround for now. I'll work on putting together a smarter disk mapping. Created attachment 429128 [details]
patch to detect UUIDs for compatible devices
ok, its not perfect, and I've not tested it yet, but this should allow mkdumprd to search for devices based on uuid for those devices which support uuid assignment (most/all disk drives). If you could give this a spin and see if it fixes your problem, that would be a big help to me. Thanks!
Created attachment 429764 [details]
Log with patch
I've patched the /sbin/mkdumprd but it quite didn't help.
I guess the reason is this line from the log:
Usage: msh LABEL=<label>|UUID=<uuid>
Created attachment 429803 [details]
new version of patch
sorry missed escaping a few $ symbols.
Created attachment 430289 [details]
Patch that partially work
I had to update the patch in order to get kdump kernel to start but it still can't kdump.
I had to change /sbin/findfs to /sbin/findfs_sys (and copy it there) because otherwise two versions of findfs got mixed up (actually only the /sbin/findfs was used).
Either way findfs_sys couldn't find device (I tried blkid and it couldn't find it too). The device gets created in /sys/block/sdb but is not initialized in /dev/.
With this patch at least UUID generation for device work (had to add /dev/ before $device because input of findstoragedriver() was only sda2).
I had to change the blkid -sUUID line to blkid -o export -sUUID because otherwise there were " around the UUID that caused problems in running system.
findfs only does this:
findfs_sys: unable to resolve 'UUID=iV6wzv-mAaL-OW18-Z35b-t7tf-NIE1-AxbjPb'
So I guess the only problem now is with findfs not being able to recognize devices.
Created attachment 430431 [details]
new approach to fix this
doh! I just realized something. While its going to be workable in your case, there are several cases in which detecting uuid is going to fail, as uuids apply to filesystems, not devices. What we need is an immutable, unique value to identify devices regardless of the named order in which they are detected. I'm afraid thats very difficult to put together, but this patch should get us close(er) to that goal. I've not tested it yet, but if you'd like to give it a try, you're welcome to.
Created attachment 430667 [details]
Patch that handles same devices
After update of the patch(escaping, local outside of function and similar) it started to work on my machine but I think the DSKSTRING is not as unique as it should be. The way I see it if the computer had 2 same disks it would wait only for one of them so I propose this patch that I've created (it looks in /sys/block for devices that have same DSKSTRING and then write its count as 3rd value to the /etc/critical_disks, also when it waits for devices it waits for the necessary amount of them).
I've tested the patch on the machine and it works fine (kdump works, vmcore is created).
yeah, I like that modification, thank you. looks like this is slated for 6.1, so as soon as its approved I'll commit it, thanks! Red Hat Enterprise Linux 6.0 is now available and should resolve the problem described in this bug report. This report is therefore being closed with a resolution of CURRENTRELEASE. You may reopen this bug report if the solution does not work for you. |
Created attachment 417634 [details] Log from dell-pe2850-04.rhts.eng.bos.redhat.com Description of problem: kdump kernel failed to mount device in /dev/mapper/ because the device is not being created on machine dell-pe2850-04.rhts.eng.bos.redhat.com running i386 kernel. Version-Release number of selected component (if applicable): kernel: 2.6.32-30 kexec-tools: 2.0.0-72.el6 lvm2: 2.02.66-2.el6 How reproducible: 100 % Steps to Reproduce: 1. Setup kdump to dump to local lvm device(i.e. /dev/mapper/vg_dellpe285004-lv_root) 2. Crash kernel, i.e. echo c >/proc/sysrq-trigger 3. Watch output Actual results: LVM device is not initialized(no lvm device in /dev/mapper/) and kdump can't be taken. Expected results: LVM device is initialized. Additional info: This should be machine/device-specific(any dell-pe2850-0x machine should be ok). Other machines usually work ok. I'm attaching console output.