Bug 521865 - Xen fails to boot on ia64 with > 128GB memory
Summary: Xen fails to boot on ia64 with > 128GB memory
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.4
Hardware: ia64
OS: Linux
medium
high
Target Milestone: rc
: ---
Assignee: Red Hat Kernel Manager
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
: 508651 563813 583598 645745 (view as bug list)
Depends On:
Blocks: 500798 5.5TechNotes-Updates
TreeView+ depends on / blocked
 
Reported: 2009-09-08 14:51 UTC by Jiri Zapletal
Modified: 2010-11-22 23:26 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
The "xenheap_megabytes" hypervisor option is now supported on ia64 systems as well. The option can be used to run the Xen hypervisor on ia64 systems with more than 64GB of RAM. If the installed memory exceeds 64GB, it is suggested to set the option to a value equal to the memory size in gigabytes. For example, on a system with 128GB of memory the elilo.conf file should include the directive append="xenheap_megabytes=128 --"
Clone Of:
Environment:
Last Closed: 2010-03-30 07:36:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Full log from xen (2.50 KB, text/plain)
2009-09-08 14:51 UTC, Jiri Zapletal
no flags Details
Xen log - 3.1.2-164.el5.pbtest - OK (15.22 KB, text/plain)
2009-09-23 09:19 UTC, Jiri Zapletal
no flags Details
patch adding a "mem" commandline param (2.31 KB, patch)
2009-09-24 13:00 UTC, Paolo Bonzini
no flags Details | Diff
patch adding a "xenheap_megabytes" commandline param (2.48 KB, patch)
2009-09-30 13:28 UTC, Paolo Bonzini
no flags Details | Diff
xm dmesg on -190 (8.38 KB, application/octet-stream)
2010-03-03 17:29 UTC, Jan Tluka
no flags Details
dom dmesg on -190 (19.68 KB, application/octet-stream)
2010-03-03 17:29 UTC, Jan Tluka
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0178 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.5 kernel security and bug fix update 2010-03-29 12:18:21 UTC

Description Jiri Zapletal 2009-09-08 14:51:46 UTC
Created attachment 360088 [details]
Full log from xen

Description of problem:
Booting Xen on IA64 with lot of memory (128 GB) fails with xen panic.
I suspect miscalculating heap addresses.

(XEN) System RAM: 122606MB (125549552kB)
(XEN) size of virtual frame_table: 306736kB
(XEN) virtual machine to physical table: f3fffffeef008038 size: 61488kB
(XEN) max_page: 0x221feff9
(XEN) Xen heap: 17592186044409MB (18014398509475696kB)
(XEN) Cannot handle page request order 1!
(XEN) Xen BUG at page_alloc.c:343
(XEN) FIXME: implement ia64 dump_execution_state()

Version-Release number of selected component (if applicable):
kernel-xen-2.6.18-164.el5
xen-3.0.3-94.el5

How reproducible:
Always

Steps to Reproduce:
1. boot xen 3.1.2-164.el5 on ia64
2. watch errors on console
  
Actual results:
Xen panics and reboots.

Expected results:
Xen works

Additional info:

Comment 2 Paolo Bonzini 2009-09-17 09:46:57 UTC
The reported number of kilobytes, 18014398509475696, is exactly

((uint64_t -6438912) >> 10

Comment 3 Paolo Bonzini 2009-09-17 09:48:22 UTC
... so the problem is probably not unsigned/signed, but really some kind of overflow that yields a negative number of bytes for the Xen heap size.  Then the number is shifted 10 bits right to get the KB and 20 bits right to get the MB.

Comment 4 Paolo Bonzini 2009-09-17 10:05:31 UTC
Here is how the log matches the code in arch/ia64/xen/xensetup.c:

(XEN) xen image pstart: 0x4000000, xenheap pend: 0x8000000
>>>> xen_pstart       = 0x4000000
>>>> xenheap_phys_end = 0x8000000
>>>> xenheap_size     = 0x4000000

(XEN) find_memory: efi_memmap_walk returns max_page=221feff9
(XEN) Before xen_heap_start: f0000000041e0c80
(XEN) After xen_heap_start: f000000008624000

The problem is that the end of the Xen heap is the physical address 0x8624000 which is beyond the statically allocated space.  Can you try this patch?

diff --git a/include/asm-ia64/config.h b/include/asm-ia64/config.h
index 12c9bf9..1ed4fd2 100644
--- a/include/asm-ia64/config.h
+++ b/include/asm-ia64/config.h
@@ -113,8 +113,8 @@ extern char _end[]; /* standard ELF symbol */
 ///////////////////////////////////////////////////////////////
 // xen/include/asm/config.h
 // Natural boundary upon TR size to define xenheap space
-#define XENHEAP_DEFAULT_MB (1 << (KERNEL_TR_PAGE_SHIFT - 20))
-#define XENHEAP_DEFAULT_SIZE	(1 << KERNEL_TR_PAGE_SHIFT)
+#define XENHEAP_DEFAULT_MB (4 << (KERNEL_TR_PAGE_SHIFT - 20))
+#define XENHEAP_DEFAULT_SIZE	(4 << KERNEL_TR_PAGE_SHIFT)
 #define	ELFSIZE	64
 
 ///////////////////////////////////////////////////////////////

Comment 5 Paolo Bonzini 2009-09-18 06:58:29 UTC
Related upstream c/s are 19109, 19130 and possibly 19129.

Comment 6 Paolo Bonzini 2009-09-21 08:34:54 UTC
I placed test kernels at http://people.redhat.com/pbonzini/bz521865/

Please report if these packages fix the bug.  Thanks!

Comment 7 Jiri Zapletal 2009-09-23 08:34:18 UTC
(In reply to comment #6)
> I placed test kernels at http://people.redhat.com/pbonzini/bz521865/
> 
> Please report if these packages fix the bug.  Thanks!  

Provided rpm kernel-xen-2.6.18-164.el5.pbtest.ia64.rpm is truncated.

[root@hp-rx8640-03 ~]# rpmsign -K kernel-*
kernel-2.6.18-164.el5.pbtest.ia64.rpm: sha1 md5 OK
kernel-xen-2.6.18-164.el5.pbtest.ia64.rpm: sha1 MD5 NOT OK

1d5cb7992e0cae0604cd5aeceb612197  pub/kernel-2.6.18-164.el5.pbtest.ia64.rpm
1d5cb7992e0cae0604cd5aeceb612197  brew/kernel-2.6.18-164.el5.pbtest.ia64.rpm
c68acdc13617b4e19488f5e740e5623b  pub/kernel-xen-2.6.18-164.el5.pbtest.ia64.rpm
33104c52cfd01cdded4a905f93c534e2  brew/kernel-xen-2.6.18-164.el5.pbtest.ia64.rpm

Using rpms from you brew build.

Comment 8 Jiri Zapletal 2009-09-23 09:17:52 UTC
Using provided rpms (kernel-2.6.18-164.el5.pbtest.ia64.rpm, kernel-xen-2.6.18-164.el5.pbtest.ia64.rpm), xen boots and pass basic smoke testing (xen pv).

Xen boot log attached.

Comment 9 Jiri Zapletal 2009-09-23 09:19:19 UTC
Created attachment 362224 [details]
Xen log - 3.1.2-164.el5.pbtest - OK

Comment 10 Paolo Bonzini 2009-09-24 13:00:45 UTC
Created attachment 362490 [details]
patch adding a "mem" commandline param

Comment 11 Paolo Bonzini 2009-09-24 13:01:54 UTC
The new patch will require release notes.

Comment 12 Paolo Bonzini 2009-09-24 18:58:06 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
On ia64 systems, the Xen hypervisor is now able to run on systems with 128GB of memory or more.  However, for these systems to boot successfully, the "mem" command-line argument has to be passed to Xen.  For example, on a system with 128GB of memory the elilo.conf file should include the directive 

      append="mem=128G --"

Comment 13 Chris Lalancette 2009-09-25 14:30:49 UTC
*** Bug 508651 has been marked as a duplicate of this bug. ***

Comment 14 Paolo Bonzini 2009-09-30 13:27:11 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1,3 +1,3 @@
-On ia64 systems, the Xen hypervisor is now able to run on systems with 128GB of memory or more.  However, for these systems to boot successfully, the "mem" command-line argument has to be passed to Xen.  For example, on a system with 128GB of memory the elilo.conf file should include the directive 
+The "xenheap_megabytes" hypervisor option is now supported on ia64 systems as well.  The option can be used to run the Xen hypervisor on ia64 systems with more than 64GB of RAM.  If the installed memory exceeds 64GB, it is suggested to set the option to a value equal to the memory size in gigabytes.  For example, on a system with 128GB of memory the elilo.conf file should include the directive 
 
-      append="mem=128G --"+      append="xenheap_megabytes=128 --"

Comment 15 Paolo Bonzini 2009-09-30 13:28:00 UTC
Created attachment 363182 [details]
patch adding a "xenheap_megabytes" commandline param

Comment 17 Don Zickus 2009-11-03 22:53:03 UTC
in kernel-2.6.18-172.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 19 Miroslav Rezanina 2010-02-11 09:28:49 UTC
*** Bug 563813 has been marked as a duplicate of this bug. ***

Comment 20 Jan Tluka 2010-03-03 17:27:40 UTC
Reproduced on RHEL5.4, kernel panic.

Verified on -190.el5 kernel. Dom0 succesfully booted.

 __  __            _____  _   ____     _  ___   ___       _ ____  
 \ \/ /___ _ __   |___ / / | |___ \   / |/ _ \ / _ \  ___| | ___| 
  \  // _ \ '_ \    |_ \ | |   __) |__| | (_) | | | |/ _ \ |___ \ 
  /  \  __/ | | |  ___) || |_ / __/|__| |\__, | |_| |  __/ |___) |
 /_/\_\___|_| |_| |____(_)_(_)_____|  |_|  /_/ \___(_)___|_|____/ 
                                                                  
 http://www.cl.cam.ac.uk/netos/xen
 University of Cambridge Computer Laboratory

 Xen version 3.1.2-190.el5 (mockbuild) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)) Mon Feb 22 19:11:19 EST 2010
 Latest ChangeSet: unavailable

(XEN) Xen command line: BOOT_IMAGE=scsi0:EFI\redhat\xen.gz-2.6.18-190.el5  xenheap_megabytes=128 
(XEN) xen image pstart: 0x4000000, xenheap pend: 0xc000000
(XEN) Xen patching physical address access by offset: 0x0
(XEN) find_memory: efi_memmap_walk returns max_page=221feff9
(XEN) Before xen_heap_start: f0000000041e0c80
(XEN) After xen_heap_start: f000000008624000
(XEN) warning: skipping physical page 0
(XEN) Init boot pages: 0x4000 -> 0x4000000.
(XEN) Init boot pages: 0xc000000 -> 0x1fffc000.
(XEN) Init boot pages: 0x70020000000 -> 0x705fb000000.
(XEN) Init boot pages: 0x78000000000 -> 0x787fc000000.
(XEN) Init boot pages: 0x80000000000 -> 0x807fc000000.
(XEN) Init boot pages: 0x88000000000 -> 0x887f9b76000.
(XEN) Init boot pages: 0x887fa609130 -> 0x887fa99f000.
(XEN) Init boot pages: 0x887faf7d6ed -> 0x887faf90018.
(XEN) Init boot pages: 0x887faf90df8 -> 0x887faf92018.
(XEN) Init boot pages: 0x887faf92078 -> 0x887faf95fa0.
(XEN) Init boot pages: 0x887faf95fea -> 0x887fbd9c000.
(XEN) Init boot pages: 0x887fbe80000 -> 0x887fbfe4000.
(XEN) System RAM: 122606MB (125549552kB)
(XEN) size of virtual frame_table: 306736kB
(XEN) virtual machine to physical table: f3fffffeef008038 size: 61488kB
(XEN) max_page: 0x221feff9
(XEN) Xen heap: 57MB (59248kB)
(XEN) Reserving non-aligned node boundary @ mfn 469794816
(XEN) Reserving non-aligned node boundary @ mfn 503316480
(XEN) Reserving non-aligned node boundary @ mfn 536870912
(XEN) Reserving non-aligned node boundary @ mfn 570425344
(XEN) Domain heap initialised: DMA width 32 bits
(XEN) avail:0x3170074000000000, status:0x74000000000,control:0x3170010000000000, vm?0x10000000000
(XEN) WARNING: no opcode provided from hardware(0)!!!
(XEN) vm buffer size: 1048576, order: 6
(XEN) vm_buffer: 0xf000000008700000
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Time init:
(XEN) .... System Time: 25122ns
(XEN) .... scale:              280000000
(XEN) num_online_cpus=1, max_cpus=64
(XEN) Brought up 32 CPUs
(XEN) xenoprof: using perfmon.
(XEN) perfmon: version 2.0 IRQ 238
(XEN) perfmon: Montecito PMU detected, 27 PMCs, 35 PMDs, 12 counters (47 bits)
(XEN) Maximum number of domains: 63; 18 RID bits per domain
(XEN) *** LOADING DOMAIN 0 ***
(XEN) Maximum permitted dom0 size: 121883MB
(XEN) elf_parse_binary: phdr: paddr=0x4000000 memsz=0x8cdd00
(XEN) elf_parse_binary: phdr: paddr=0x48d0000 memsz=0x8b90
(XEN) elf_parse_binary: phdr: paddr=0x48e0000 memsz=0x5caee0
(XEN) elf_parse_binary: memory: 0x4000000 -> 0x4eaaee0
(XEN) elf_xen_addr_calc_check: VIRT_BASE unset, using 0x0
(XEN) elf_xen_addr_calc_check: ELF_PADDR_OFFSET unset, using 0x0
(XEN) elf_xen_addr_calc_check: addresses:
(XEN)     virt_base        = 0x0
(XEN)     elf_paddr_offset = 0x0
(XEN)     virt_offset      = 0x0
(XEN)     virt_kstart      = 0x4000000
(XEN)     virt_kend        = 0x4eaaee0
(XEN)     virt_entry       = 0x400ff20
(XEN)  Dom0 kernel: 64-bit, lsb, paddr 0x4000000 -> 0x4eaaee0
(XEN) METAPHYSICAL MEMORY ARRANGEMENT:
(XEN)  Kernel image:  4000000->4eaaee0
(XEN)  Entry address: 400ff20
(XEN)  Init. ramdisk: 4eb0000 len 5de6ed
(XEN)  Start info.:   4eac000->4eb0000
(XEN) Dom0 max_vcpus=4
(XEN) Dom0: 0xf000000008e54080
(XEN) enable lsapic entry: 0xf0000705fd7a01cc
(XEN) enable lsapic entry: 0xf0000705fd7a01d8
(XEN) enable lsapic entry: 0xf0000705fd7a01e4
(XEN) enable lsapic entry: 0xf0000705fd7a01f0
(XEN) DISABLE lsapic entry: 0xf0000705fd7a01fc
(XEN) DISABLE lsapic entry: 0xf0000705fd7a0208
(XEN) DISABLE lsapic entry: 0xf0000705fd7a0214
(XEN) DISABLE lsapic entry: 0xf0000705fd7a0220
(XEN) DISABLE lsapic entry: 0xf0000705fd7a022c
(XEN) DISABLE lsapic entry: 0xf0000705fd7a0238
(XEN) DISABLE lsapic entry: 0xf0000705fd7a0244
(XEN) DISABLE lsapic entry: 0xf0000705fd7a0250
(XEN) DISABLE lsapic entry: 0xf0000705fd7a025c
(XEN) DISABLE lsapic entry: 0xf0000705fd7a0268
(XEN) DISABLE lsapic entry: 0xf0000705fd7a0274
(XEN) DISABLE lsapic entry: 0xf0000705fd7a0280
(XEN) DISABLE lsapic entry: 0xf0000705fd7a028c
(XEN) DISABLE lsapic entry: 0xf0000705fd7a0298
(XEN) DISABLE lsapic entry: 0xf0000705fd7a02a4
(XEN) DISABLE lsapic entry: 0xf0000705fd7a02b0
(XEN) DISABLE lsapic entry: 0xf0000705fd7a02bc
(XEN) DISABLE lsapic entry: 0xf0000705fd7a02c8
(XEN) DISABLE lsapic entry: 0xf0000705fd7a02d4
(XEN) DISABLE lsapic entry: 0xf0000705fd7a02e0
(XEN) DISABLE lsapic entry: 0xf0000705fd7a02ec
(XEN) DISABLE lsapic entry: 0xf0000705fd7a02f8
(XEN) DISABLE lsapic entry: 0xf0000705fd7a0304
(XEN) DISABLE lsapic entry: 0xf0000705fd7a0310
(XEN) DISABLE lsapic entry: 0xf0000705fd7a031c
(XEN) DISABLE lsapic entry: 0xf0000705fd7a0328
(XEN) DISABLE lsapic entry: 0xf0000705fd7a0334
(XEN) DISABLE lsapic entry: 0xf0000705fd7a0340
(XEN) Success Disabling SRAT
(XEN) Success Disabling SLIT
(XEN) Domain0 EFI passthrough: ACPI 2.0=0x705fd7a0000 SMBIOS=0x1fffe000
(XEN) Scrubbing Free RAM: .............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................done.
(XEN) Xen trace buffers: disabled
(XEN) Std. Loglevel: Errors and warnings
(XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings)
(XEN) Xen is relinquishing VGA console.
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen).
(XEN) Linux version 2.6.18-190.el5xen (mockbuild.redhat.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)) #1 SMP Mon Feb 22 19:34:51 EST 2010
(XEN) 
EFI v1.00 by Xen/ia64: SALsystab=0x4178 ACPI 2.0=0x705fd7a0000 SMBIOS=0x1fffe000
(XEN) 
booting generic kernel on platform hpzx1
(XEN) 
rsvd_region[0]: [0xe000000000004228, 0xe000000000004a70)
(XEN) 
rsvd_region[1]: [0xe000000001000000, 0xe000000001000048)
(XEN) 
rsvd_region[2]: [0xe000000004000000, 0xe000000004eaaee0)
(XEN) 
rsvd_region[3]: [0xe000000004eac000, 0xe00000000548e6ed)
(XEN) 
rsvd_region[4]: [0xffffffffffffffff, 0xffffffffffffffff)
(XEN) 
Initial ramdisk at: 0xe000000004eb0000 (6153965 bytes)
(XEN) 
SAL 0.1: Xen/ia64 Xen/ia64 version 0.0
(XEN) 
SAL: AP wakeup using external interrupt vector 0xf3
(XEN) 
No logical to physical processor mapping available
(XEN) 
vcpu_set_itc: Setting ar.itc is currently disabled (this message is only displayed once)
(XEN) cpu_init: PAL max_purges is overridden to 1 PALO is required for multiple outsanding ptc.g 
(XEN) 
ACPI: Local APIC address c0000000fee00000
(XEN) 
<G><2>mm.c:1164:d0 efi_mmio: physaddr 0xf801001c800 size = 0x4000
(XEN) GSI 16 (level, low) -> CPU 0 (0x0000) vector 49
(XEN) 
4 CPUs available, 32 CPUs total
(XEN) 
Running on Xen! start_info_pfn=0x13ab nr_pages=262144 flags=0x3
(XEN) 
MCA related initialization done
(XEN) 
Virtual mem_map starts at 0xa0007ff9df21c000
(XEN) 
SMP: Allowing 32 CPUs, 28 hotplug CPUs
(XEN) 
Built 1 zonelists.  Total pages: 257754
(XEN) 
Kernel command line: root=/dev/VolGroup00/LogVol00 rhgb quiet ro
(XEN) 
<2>arch_boot_vcpu: vcpu 1 awaken
(XEN) WARN: GSI 23 in use by Xen.

Comment 21 Jan Tluka 2010-03-03 17:29:00 UTC
Created attachment 397620 [details]
xm dmesg on -190

xm dmesg on succesfull boot of -190.el5xen kernel.

Comment 22 Jan Tluka 2010-03-03 17:29:35 UTC
Created attachment 397621 [details]
dom dmesg on -190

dom0 dmesg on -190.el5xen kernel

Comment 24 errata-xmlrpc 2010-03-30 07:36:02 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html

Comment 27 Andrew Jones 2010-04-21 14:59:02 UTC
*** Bug 583598 has been marked as a duplicate of this bug. ***

Comment 28 Paolo Bonzini 2010-10-22 12:15:58 UTC
*** Bug 645745 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.