Bug 242648
Summary: | kdump broken on x86_64 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Gerd Hoffmann <kraxel> | ||||||||
Component: | kernel | Assignee: | Neil Horman <nhorman> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> | ||||||||
Severity: | low | Docs Contact: | |||||||||
Priority: | low | ||||||||||
Version: | 5.0 | CC: | jnomura, sprabhu | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | RHBA-2007-0959 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2007-11-07 19:51:22 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 223736, 230752, 243442 | ||||||||||
Attachments: |
|
Description
Gerd Hoffmann
2007-06-05 09:41:30 UTC
I'm looking at the boot code, and from what I see, max_low_pfn is set prior to every call to setup_bootmem_allocator, where that check takes place, so I'm not sure that your analysis is correct (at least not yet). What your describing however, sounds to me like the result of a patch that was added in 2.6.18-17.EL5 for bz 236759. It was supposed to fix a broken northbridge setup, but for some systems it resulted in a hang (and as a side effect, a failure to validate crashkernel reources, probably from a miscomputed max_low_pfn value). I'm not sure of the details, but you can likely confirm this is the case by testing with kernel-2.6.18-16.EL5, and then again with kernel-2.6.18-17.EL5. If the problem is not present in the former, and is in the latter, then we can assume thats the problem we need to track down. Created attachment 156327 [details]
workaround
I'm using the attached patch to workaround the broken test (which just comments
it). I had also added man_low_pfn output to the error message for debugging,
it showed max_low_pfn still being 0. I'll go fetch and test 16+17 soon.
Hmm, both 16 and 17 fail, so it must be something else. Current release (8.something) works ok though. Will try the builds inbetween now ... Created attachment 156329 [details]
boot log diff
The regression was added between 14.el5 and 15.el5.
Well, thats odd. That correlates to prarits addition of the bounds checking addition, but that just brings me back to wondering why max_low_pfn is set improperly for you. I'll try reproduce on an x86_64 system here, but I'm beginning to suspect that this is a problem with the max_low_pfn computation specific to your system. About the only other thing that might relate is my patch for support for the calgary iommu. Is this an IBM system by any chance that your working with? If so could you please try booting your system with iommu=soft on the command line? No ibm box, it is a virtual machine, using kvm (i.e. pretty standard pc hardware as emulated by qemu, intel ich3 chipset IIRC ...). Ok, so it is prarits check addition that has done this, and I still don't understand why max_low_pfn isn't set yet. On the up side, I just recreated here. I'll start debugging right away. Thanks! Created attachment 156376 [details]
patch to correct x86_64 crashdump validation
Think I found the problem. As it turns out I was wrong before, x86_64 doesn't
even initalize max_low_pfn since max_low_pfn represents the maximum page frame
number that is allowed in lowmem, and x86_64, not currently having the concept
of lowmem (or rather, having the address space to treat all memory as de facto
lowmem), never sets max_low_pfn. That being the case, kexec should be able to
reserve memory from anywhere in the physical address space, as long as the ram
actually exists. As such, instead of checking our crashkernel parameter
against max_low_pfn, we need to be checking it against end_pfn, which should
allow any address that is physically populated with ram.
I've tested the attached patch, and it works well for me. Please test it on
your system as well and confirm. Thanks!
I'm posting this, since today is the deadline for this. This request was evaluated by Red Hat Kernel Team for inclusion in a Red Hat Enterprise Linux maintenance release, and has moved to bugzilla status POST. Patch works fine for me, thanks. *** Bug 242817 has been marked as a duplicate of this bug. *** *** Bug 238987 has been marked as a duplicate of this bug. *** in 2.6.18-27.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0959.html |