Red Hat Bugzilla – Bug 242648
kdump broken on x86_64
Last modified: 2007-11-30 17:07:44 EST
Description of problem:
kdump broken on x86_64
Version-Release number of selected component (if applicable):
boot with crashkernel=something, check dmesg and /proc/iomem,
reserving crash kernel memory failed.
linux-2.6-kdump-bounds-checking-for-crashkernel-args.patch is broken.
Initialization order bug: On x86_64 the sanity check happens before
max_low_pfn is initialized, thus it fails no matter what arguments
I'm looking at the boot code, and from what I see, max_low_pfn is set prior to
every call to setup_bootmem_allocator, where that check takes place, so I'm not
sure that your analysis is correct (at least not yet). What your describing
however, sounds to me like the result of a patch that was added in 2.6.18-17.EL5
for bz 236759. It was supposed to fix a broken northbridge setup, but for some
systems it resulted in a hang (and as a side effect, a failure to validate
crashkernel reources, probably from a miscomputed max_low_pfn value). I'm not
sure of the details, but you can likely confirm this is the case by testing with
kernel-2.6.18-16.EL5, and then again with kernel-2.6.18-17.EL5. If the problem
is not present in the former, and is in the latter, then we can assume thats the
problem we need to track down.
Created attachment 156327 [details]
I'm using the attached patch to workaround the broken test (which just comments
it). I had also added man_low_pfn output to the error message for debugging,
it showed max_low_pfn still being 0. I'll go fetch and test 16+17 soon.
Hmm, both 16 and 17 fail, so it must be something else. Current release
(8.something) works ok though. Will try the builds inbetween now ...
Created attachment 156329 [details]
boot log diff
The regression was added between 14.el5 and 15.el5.
Well, thats odd. That correlates to prarits addition of the bounds checking
addition, but that just brings me back to wondering why max_low_pfn is set
improperly for you. I'll try reproduce on an x86_64 system here, but I'm
beginning to suspect that this is a problem with the max_low_pfn computation
specific to your system.
About the only other thing that might relate is my patch for support for the
calgary iommu. Is this an IBM system by any chance that your working with? If
so could you please try booting your system with iommu=soft on the command line?
No ibm box, it is a virtual machine, using kvm (i.e. pretty standard pc hardware
as emulated by qemu, intel ich3 chipset IIRC ...).
Ok, so it is prarits check addition that has done this, and I still don't
understand why max_low_pfn isn't set yet. On the up side, I just recreated
here. I'll start debugging right away. Thanks!
Created attachment 156376 [details]
patch to correct x86_64 crashdump validation
Think I found the problem. As it turns out I was wrong before, x86_64 doesn't
even initalize max_low_pfn since max_low_pfn represents the maximum page frame
number that is allowed in lowmem, and x86_64, not currently having the concept
of lowmem (or rather, having the address space to treat all memory as de facto
lowmem), never sets max_low_pfn. That being the case, kexec should be able to
reserve memory from anywhere in the physical address space, as long as the ram
actually exists. As such, instead of checking our crashkernel parameter
against max_low_pfn, we need to be checking it against end_pfn, which should
allow any address that is physically populated with ram.
I've tested the attached patch, and it works well for me. Please test it on
your system as well and confirm. Thanks!
I'm posting this, since today is the deadline for this.
This request was evaluated by Red Hat Kernel Team for inclusion in a Red
Hat Enterprise Linux maintenance release, and has moved to bugzilla
Patch works fine for me, thanks.
*** Bug 242817 has been marked as a duplicate of this bug. ***
*** Bug 238987 has been marked as a duplicate of this bug. ***
You can download this test kernel from http://people.redhat.com/dzickus/el5
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.