215437 – kexec tools do not pass the ACPI NVS space in the kdump exactmap

Bug 215437 - kexec tools do not pass the ACPI NVS space in the kdump exactmap

Summary: kexec tools do not pass the ACPI NVS space in the kdump exactmap

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kexec-tools
Sub Component:
Version:	5.0
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Neil Horman
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-11-13 22:49 UTC by Amul Shah
Modified:	2007-11-30 22:07 UTC (History)
CC List:	3 users (show)
Fixed In Version:	beta2
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2006-12-23 02:38:43 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Ben's original patch to add ACPI NVS space to the exactmap (986 bytes, patch) 2006-11-13 22:49 UTC, Amul Shah	no flags	Details \| Diff
kdump kernel hang (3.24 KB, text/plain) 2006-11-13 22:54 UTC, Amul Shah	no flags	Details
View All

Description Amul Shah 2006-11-13 22:49:05 UTC

+++ This bug was initially created as a clone of Bug #215417 +++

Description of problem:
The kexec tools do not pass in the ACPI NVS reserved memory area as a reserved
memory area to the kdump kernel.

--- Copied text directly from Benjamin Romer's email to the Fastboot mailing list --
I'd like to submit a patch to kexec that addresses a serious problem
with kdump on the Unisys ES7000/600 system. We initially encountered
this issue on SUSE's SLES 10 beta distributions.

On the ES7000/600, the ACPI data is located in the 3GB range, and above
that is an ACPI NVS region. The problem is that kexec, when loading a
dump kernel, does not include the ACPI NVS region in the memory map it
provides to the dump kernel. This causes a kernel panic early in the
dump kernel's boot process:

Bootdata ok (command line is root=/dev/sda2 showopts console=tty0
console=ttyS0,115200n8 earlyprintk=serial,ttyS0,115200n8 memmap=exactmap
memmap=640K@0K memmap=3296K@16384K memmap=61599K@20321K
elfcorehdr=20320K memmap=408K#3144128K)
Linux version 2.6.16.14-6-kdump (geeko@buildhost) (gcc version 4.1.0
(SUSE Linux)) #1 Tue May 9 12:09:06 UTC 2006
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000100 - 000000000009e400 (usable)
 BIOS-e820: 000000000009e400 - 00000000000a0000 (reserved)
 BIOS-e820: 0000000000100000 - 00000000bfe70000 (usable)
 BIOS-e820: 00000000bfe70000 - 00000000bfed6000 (ACPI data)
 BIOS-e820: 00000000bfed6000 - 00000000bff00000 (ACPI NVS)
 BIOS-e820: 00000000bff00000 - 00000000e8000000 (usable)
 BIOS-e820: 00000000f8000000 - 00000000fec00000 (reserved)
 BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 0000000810000000 (usable)
user-defined physical RAM map:
 user: 0000000000000000 - 00000000000a0000 (usable)
 user: 0000000001000000 - 0000000001338000 (usable)
 user: 00000000013d8400 - 0000000005000000 (usable)
 user: 00000000bfe70000 - 00000000bfed6000 (ACPI data)
kernel direct mapping tables up to bfed6000 @ 8000-8000
PANIC: early exception rip 10 error ffffffff8131433b cr2 2b0ed2682180

Call Trace: <ffffffff8131433b>{reserve_bootmem_core+78}
      <ffffffff81312b52>{reserve_bootmem_generic+19}
<ffffffff81310ea7>{smp_scan_config+145}
      <ffffffff81310f02>{find_intel_smp+54}
<ffffffff8130b6af>{setup_arch+2158}
      <ffffffff813045de>{start_kernel+42} <ffffffff81304259>{_sinittext
+601}
RIP 0x10

We have determined that the cause of this panic is that the kernel
attempts to reserve the ACPI NVS region, which is defined by a pointer
stored in the ACPI data region, but cannot reserve memory above the
maximum usable memory limit. The kernel determines the maximum usable
memory by taking the highest address of usable memory specified in the
memory map; so it is setting the value to 0x5000000, as listed in the
map, then attempting to reserve memory above 0xbfed6000, which triggers
a panic.

By modifying kexec to also pass the ACPI NVS region as reserved memory
in the memory map, the kernel will not panic. We have tested this on
both the ES7000/600 and a Dell server system which exhibited the same
problem and it worked on both. The attached patch file contains the
changes that we made, and applies to kexec-tools-1.101.

Version-Release number of selected component (if applicable):
Tested with RHEL5 Beta 2 Milestone 9

How reproducible:
Always on an ES7000

Steps to Reproduce:
1. Setup for kdump with boot paramter crashkernel=64M@16M
2. Install the kernel with  kexec -p /boot/vmlinux-kdump --args-linux
--command-line="`cat /proc/cmdline` lpj=1306000
earlyprintk=serial,ttyS0,115200n8" --initrd=/boot/initrd-kdump
3. Issue alt-sysrq-c (or echo c > /proc/sysrq-trigger)
  
Actual results:
Unfortunately the kernel just hangs.  I am attaching the serial console output
from early_printk.  I'll try modifying the kernel and upgrading the kexec tools
to see if I can find out some more information.

Expected results:
Kdump kernel should boot up.

Additional info:
Original patch submission to the fastboot mailing list
http://lists.osdl.org/pipermail/fastboot/2006-June/003202.html

Inclusion into the kexec tool tree
http://lists.osdl.org/pipermail/fastboot/2006-July/003412.html

Comment 1 Amul Shah 2006-11-13 22:49:05 UTC

Created attachment 141118 [details]
Ben's original patch to add ACPI NVS space to the exactmap

Comment 2 Amul Shah 2006-11-13 22:54:49 UTC

Created attachment 141119 [details]
kdump kernel hang

Comment 3 Neil Horman 2006-11-14 17:32:12 UTC

fixed in -132.el5. thanks.

Comment 4 RHEL Program Management 2006-11-14 20:00:43 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 5 RHEL Program Management 2006-12-23 02:38:43 UTC

A package has been built which should help the problem described in 
this bug report. This report is therefore being closed with a resolution 
of CURRENTRELEASE. You may reopen this bug report if the solution does 
not work for you.

Note You need to log in before you can comment on or make changes to this bug.