Bug 242648 - kdump broken on x86_64
Summary: kdump broken on x86_64
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.0
Hardware: x86_64
OS: Linux
low
low
Target Milestone: ---
: ---
Assignee: Neil Horman
QA Contact: Martin Jenner
URL:
Whiteboard:
: 238987 242817 (view as bug list)
Depends On:
Blocks: 223736 230752 243442
TreeView+ depends on / blocked
 
Reported: 2007-06-05 09:41 UTC by Gerd Hoffmann
Modified: 2007-11-30 22:07 UTC (History)
2 users (show)

Fixed In Version: RHBA-2007-0959
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-11-07 19:51:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
workaround (1.27 KB, patch)
2007-06-06 08:09 UTC, Gerd Hoffmann
no flags Details | Diff
boot log diff (15.39 KB, text/plain)
2007-06-06 09:20 UTC, Gerd Hoffmann
no flags Details
patch to correct x86_64 crashdump validation (1.04 KB, patch)
2007-06-06 18:05 UTC, Neil Horman
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2007:0959 0 normal SHIPPED_LIVE Updated kernel packages for Red Hat Enterprise Linux 5 Update 1 2007-11-08 00:47:37 UTC

Description Gerd Hoffmann 2007-06-05 09:41:30 UTC
Description of problem:
kdump broken on x86_64

Version-Release number of selected component (if applicable):
2.6.18-20.el5

How reproducible:
boot with crashkernel=something, check dmesg and /proc/iomem,
reserving crash kernel memory failed.

Additional info:
linux-2.6-kdump-bounds-checking-for-crashkernel-args.patch is broken.
Initialization order bug: On x86_64 the sanity check happens before
max_low_pfn is initialized, thus it fails no matter what arguments
where given.

Comment 1 Neil Horman 2007-06-05 20:07:46 UTC
I'm looking at the boot code, and from what I see, max_low_pfn is set prior to
every call to setup_bootmem_allocator, where that check takes place, so I'm not
sure that your analysis is correct (at least not yet).  What your describing
however, sounds to me like the result of a patch that was added in 2.6.18-17.EL5
for bz 236759.  It was supposed to fix a broken northbridge setup, but for some
systems it resulted in a hang (and as a side effect, a failure to validate
crashkernel reources, probably from a miscomputed max_low_pfn value).  I'm not
sure of the details, but you can likely confirm this is the case by testing with
kernel-2.6.18-16.EL5, and then again with kernel-2.6.18-17.EL5.  If the problem
is not present in the former, and is in the latter, then we can assume thats the
problem we need to track down.

Comment 2 Gerd Hoffmann 2007-06-06 08:09:55 UTC
Created attachment 156327 [details]
workaround

I'm using the attached patch to workaround the broken test (which just comments
it).  I had also added man_low_pfn output to the error message for debugging,
it showed max_low_pfn still being 0.  I'll go fetch and test 16+17 soon.

Comment 3 Gerd Hoffmann 2007-06-06 08:44:02 UTC
Hmm, both 16 and 17 fail, so it must be something else.  Current release
(8.something) works ok though.  Will try the builds inbetween now ...

Comment 4 Gerd Hoffmann 2007-06-06 09:20:35 UTC
Created attachment 156329 [details]
boot log diff

The regression was added between 14.el5 and 15.el5.

Comment 5 Neil Horman 2007-06-06 13:37:26 UTC
Well, thats odd.  That correlates to prarits addition of the bounds checking
addition, but that just brings me back to wondering why max_low_pfn is set
improperly for you.  I'll try reproduce on an x86_64 system here, but I'm
beginning to suspect that this is a problem with the max_low_pfn computation
specific to your system.

About the only other thing that might relate is my patch for support for the
calgary iommu.  Is this an IBM system by any chance that your working with?  If
so could you please try booting your system with iommu=soft on the command line?

Comment 6 Gerd Hoffmann 2007-06-06 13:59:33 UTC
No ibm box, it is a virtual machine, using kvm (i.e. pretty standard pc hardware
as emulated by qemu, intel ich3 chipset IIRC ...).

Comment 7 Neil Horman 2007-06-06 14:24:59 UTC
Ok, so it is prarits check addition that has done this, and I still don't
understand why max_low_pfn isn't set yet.  On the up side, I just recreated
here.  I'll start debugging right away.  Thanks!

Comment 8 Neil Horman 2007-06-06 18:05:37 UTC
Created attachment 156376 [details]
patch to correct x86_64 crashdump validation 

Think I found the problem.  As it turns out I was wrong before, x86_64 doesn't
even initalize max_low_pfn since max_low_pfn represents the maximum page frame
number that is allowed in lowmem, and x86_64, not currently having the concept
of lowmem (or rather, having the address space to treat all memory as de facto
lowmem), never sets max_low_pfn.  That being the case, kexec should be able to
reserve memory from anywhere in the physical address space, as long as the ram
actually exists.  As such, instead of checking our crashkernel parameter
against max_low_pfn, we need to be checking it against end_pfn, which should
allow any address that is physically populated with ram. 

I've tested the attached patch, and it works well for me.  Please test it on
your system as well and confirm.  Thanks!

Comment 9 Neil Horman 2007-06-06 18:10:57 UTC
I'm posting this, since today is the deadline for this.

Comment 10 RHEL Program Management 2007-06-06 19:01:59 UTC
This request was evaluated by Red Hat Kernel Team for inclusion in a Red
Hat Enterprise Linux maintenance release, and has moved to bugzilla 
status POST.

Comment 11 Gerd Hoffmann 2007-06-07 06:34:50 UTC
Patch works fine for me, thanks.

Comment 12 Jun'ichi Nomura (Red Hat) 2007-06-07 14:27:31 UTC
*** Bug 242817 has been marked as a duplicate of this bug. ***

Comment 13 Neil Horman 2007-06-12 16:06:33 UTC
*** Bug 238987 has been marked as a duplicate of this bug. ***

Comment 14 Don Zickus 2007-06-16 00:38:41 UTC
in 2.6.18-27.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 17 errata-xmlrpc 2007-11-07 19:51:22 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0959.html



Note You need to log in before you can comment on or make changes to this bug.