Created attachment 360088 [details] Full log from xen Description of problem: Booting Xen on IA64 with lot of memory (128 GB) fails with xen panic. I suspect miscalculating heap addresses. (XEN) System RAM: 122606MB (125549552kB) (XEN) size of virtual frame_table: 306736kB (XEN) virtual machine to physical table: f3fffffeef008038 size: 61488kB (XEN) max_page: 0x221feff9 (XEN) Xen heap: 17592186044409MB (18014398509475696kB) (XEN) Cannot handle page request order 1! (XEN) Xen BUG at page_alloc.c:343 (XEN) FIXME: implement ia64 dump_execution_state() Version-Release number of selected component (if applicable): kernel-xen-2.6.18-164.el5 xen-3.0.3-94.el5 How reproducible: Always Steps to Reproduce: 1. boot xen 3.1.2-164.el5 on ia64 2. watch errors on console Actual results: Xen panics and reboots. Expected results: Xen works Additional info:
The reported number of kilobytes, 18014398509475696, is exactly ((uint64_t -6438912) >> 10
... so the problem is probably not unsigned/signed, but really some kind of overflow that yields a negative number of bytes for the Xen heap size. Then the number is shifted 10 bits right to get the KB and 20 bits right to get the MB.
Here is how the log matches the code in arch/ia64/xen/xensetup.c: (XEN) xen image pstart: 0x4000000, xenheap pend: 0x8000000 >>>> xen_pstart = 0x4000000 >>>> xenheap_phys_end = 0x8000000 >>>> xenheap_size = 0x4000000 (XEN) find_memory: efi_memmap_walk returns max_page=221feff9 (XEN) Before xen_heap_start: f0000000041e0c80 (XEN) After xen_heap_start: f000000008624000 The problem is that the end of the Xen heap is the physical address 0x8624000 which is beyond the statically allocated space. Can you try this patch? diff --git a/include/asm-ia64/config.h b/include/asm-ia64/config.h index 12c9bf9..1ed4fd2 100644 --- a/include/asm-ia64/config.h +++ b/include/asm-ia64/config.h @@ -113,8 +113,8 @@ extern char _end[]; /* standard ELF symbol */ /////////////////////////////////////////////////////////////// // xen/include/asm/config.h // Natural boundary upon TR size to define xenheap space -#define XENHEAP_DEFAULT_MB (1 << (KERNEL_TR_PAGE_SHIFT - 20)) -#define XENHEAP_DEFAULT_SIZE (1 << KERNEL_TR_PAGE_SHIFT) +#define XENHEAP_DEFAULT_MB (4 << (KERNEL_TR_PAGE_SHIFT - 20)) +#define XENHEAP_DEFAULT_SIZE (4 << KERNEL_TR_PAGE_SHIFT) #define ELFSIZE 64 ///////////////////////////////////////////////////////////////
Related upstream c/s are 19109, 19130 and possibly 19129.
I placed test kernels at http://people.redhat.com/pbonzini/bz521865/ Please report if these packages fix the bug. Thanks!
(In reply to comment #6) > I placed test kernels at http://people.redhat.com/pbonzini/bz521865/ > > Please report if these packages fix the bug. Thanks! Provided rpm kernel-xen-2.6.18-164.el5.pbtest.ia64.rpm is truncated. [root@hp-rx8640-03 ~]# rpmsign -K kernel-* kernel-2.6.18-164.el5.pbtest.ia64.rpm: sha1 md5 OK kernel-xen-2.6.18-164.el5.pbtest.ia64.rpm: sha1 MD5 NOT OK 1d5cb7992e0cae0604cd5aeceb612197 pub/kernel-2.6.18-164.el5.pbtest.ia64.rpm 1d5cb7992e0cae0604cd5aeceb612197 brew/kernel-2.6.18-164.el5.pbtest.ia64.rpm c68acdc13617b4e19488f5e740e5623b pub/kernel-xen-2.6.18-164.el5.pbtest.ia64.rpm 33104c52cfd01cdded4a905f93c534e2 brew/kernel-xen-2.6.18-164.el5.pbtest.ia64.rpm Using rpms from you brew build.
Using provided rpms (kernel-2.6.18-164.el5.pbtest.ia64.rpm, kernel-xen-2.6.18-164.el5.pbtest.ia64.rpm), xen boots and pass basic smoke testing (xen pv). Xen boot log attached.
Created attachment 362224 [details] Xen log - 3.1.2-164.el5.pbtest - OK
Created attachment 362490 [details] patch adding a "mem" commandline param
The new patch will require release notes.
Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: On ia64 systems, the Xen hypervisor is now able to run on systems with 128GB of memory or more. However, for these systems to boot successfully, the "mem" command-line argument has to be passed to Xen. For example, on a system with 128GB of memory the elilo.conf file should include the directive append="mem=128G --"
*** Bug 508651 has been marked as a duplicate of this bug. ***
Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,3 +1,3 @@ -On ia64 systems, the Xen hypervisor is now able to run on systems with 128GB of memory or more. However, for these systems to boot successfully, the "mem" command-line argument has to be passed to Xen. For example, on a system with 128GB of memory the elilo.conf file should include the directive +The "xenheap_megabytes" hypervisor option is now supported on ia64 systems as well. The option can be used to run the Xen hypervisor on ia64 systems with more than 64GB of RAM. If the installed memory exceeds 64GB, it is suggested to set the option to a value equal to the memory size in gigabytes. For example, on a system with 128GB of memory the elilo.conf file should include the directive - append="mem=128G --"+ append="xenheap_megabytes=128 --"
Created attachment 363182 [details] patch adding a "xenheap_megabytes" commandline param
in kernel-2.6.18-172.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified.
*** Bug 563813 has been marked as a duplicate of this bug. ***
Reproduced on RHEL5.4, kernel panic. Verified on -190.el5 kernel. Dom0 succesfully booted. __ __ _____ _ ____ _ ___ ___ _ ____ \ \/ /___ _ __ |___ / / | |___ \ / |/ _ \ / _ \ ___| | ___| \ // _ \ '_ \ |_ \ | | __) |__| | (_) | | | |/ _ \ |___ \ / \ __/ | | | ___) || |_ / __/|__| |\__, | |_| | __/ |___) | /_/\_\___|_| |_| |____(_)_(_)_____| |_| /_/ \___(_)___|_|____/ http://www.cl.cam.ac.uk/netos/xen University of Cambridge Computer Laboratory Xen version 3.1.2-190.el5 (mockbuild) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)) Mon Feb 22 19:11:19 EST 2010 Latest ChangeSet: unavailable (XEN) Xen command line: BOOT_IMAGE=scsi0:EFI\redhat\xen.gz-2.6.18-190.el5 xenheap_megabytes=128 (XEN) xen image pstart: 0x4000000, xenheap pend: 0xc000000 (XEN) Xen patching physical address access by offset: 0x0 (XEN) find_memory: efi_memmap_walk returns max_page=221feff9 (XEN) Before xen_heap_start: f0000000041e0c80 (XEN) After xen_heap_start: f000000008624000 (XEN) warning: skipping physical page 0 (XEN) Init boot pages: 0x4000 -> 0x4000000. (XEN) Init boot pages: 0xc000000 -> 0x1fffc000. (XEN) Init boot pages: 0x70020000000 -> 0x705fb000000. (XEN) Init boot pages: 0x78000000000 -> 0x787fc000000. (XEN) Init boot pages: 0x80000000000 -> 0x807fc000000. (XEN) Init boot pages: 0x88000000000 -> 0x887f9b76000. (XEN) Init boot pages: 0x887fa609130 -> 0x887fa99f000. (XEN) Init boot pages: 0x887faf7d6ed -> 0x887faf90018. (XEN) Init boot pages: 0x887faf90df8 -> 0x887faf92018. (XEN) Init boot pages: 0x887faf92078 -> 0x887faf95fa0. (XEN) Init boot pages: 0x887faf95fea -> 0x887fbd9c000. (XEN) Init boot pages: 0x887fbe80000 -> 0x887fbfe4000. (XEN) System RAM: 122606MB (125549552kB) (XEN) size of virtual frame_table: 306736kB (XEN) virtual machine to physical table: f3fffffeef008038 size: 61488kB (XEN) max_page: 0x221feff9 (XEN) Xen heap: 57MB (59248kB) (XEN) Reserving non-aligned node boundary @ mfn 469794816 (XEN) Reserving non-aligned node boundary @ mfn 503316480 (XEN) Reserving non-aligned node boundary @ mfn 536870912 (XEN) Reserving non-aligned node boundary @ mfn 570425344 (XEN) Domain heap initialised: DMA width 32 bits (XEN) avail:0x3170074000000000, status:0x74000000000,control:0x3170010000000000, vm?0x10000000000 (XEN) WARNING: no opcode provided from hardware(0)!!! (XEN) vm buffer size: 1048576, order: 6 (XEN) vm_buffer: 0xf000000008700000 (XEN) Using scheduler: SMP Credit Scheduler (credit) (XEN) Time init: (XEN) .... System Time: 25122ns (XEN) .... scale: 280000000 (XEN) num_online_cpus=1, max_cpus=64 (XEN) Brought up 32 CPUs (XEN) xenoprof: using perfmon. (XEN) perfmon: version 2.0 IRQ 238 (XEN) perfmon: Montecito PMU detected, 27 PMCs, 35 PMDs, 12 counters (47 bits) (XEN) Maximum number of domains: 63; 18 RID bits per domain (XEN) *** LOADING DOMAIN 0 *** (XEN) Maximum permitted dom0 size: 121883MB (XEN) elf_parse_binary: phdr: paddr=0x4000000 memsz=0x8cdd00 (XEN) elf_parse_binary: phdr: paddr=0x48d0000 memsz=0x8b90 (XEN) elf_parse_binary: phdr: paddr=0x48e0000 memsz=0x5caee0 (XEN) elf_parse_binary: memory: 0x4000000 -> 0x4eaaee0 (XEN) elf_xen_addr_calc_check: VIRT_BASE unset, using 0x0 (XEN) elf_xen_addr_calc_check: ELF_PADDR_OFFSET unset, using 0x0 (XEN) elf_xen_addr_calc_check: addresses: (XEN) virt_base = 0x0 (XEN) elf_paddr_offset = 0x0 (XEN) virt_offset = 0x0 (XEN) virt_kstart = 0x4000000 (XEN) virt_kend = 0x4eaaee0 (XEN) virt_entry = 0x400ff20 (XEN) Dom0 kernel: 64-bit, lsb, paddr 0x4000000 -> 0x4eaaee0 (XEN) METAPHYSICAL MEMORY ARRANGEMENT: (XEN) Kernel image: 4000000->4eaaee0 (XEN) Entry address: 400ff20 (XEN) Init. ramdisk: 4eb0000 len 5de6ed (XEN) Start info.: 4eac000->4eb0000 (XEN) Dom0 max_vcpus=4 (XEN) Dom0: 0xf000000008e54080 (XEN) enable lsapic entry: 0xf0000705fd7a01cc (XEN) enable lsapic entry: 0xf0000705fd7a01d8 (XEN) enable lsapic entry: 0xf0000705fd7a01e4 (XEN) enable lsapic entry: 0xf0000705fd7a01f0 (XEN) DISABLE lsapic entry: 0xf0000705fd7a01fc (XEN) DISABLE lsapic entry: 0xf0000705fd7a0208 (XEN) DISABLE lsapic entry: 0xf0000705fd7a0214 (XEN) DISABLE lsapic entry: 0xf0000705fd7a0220 (XEN) DISABLE lsapic entry: 0xf0000705fd7a022c (XEN) DISABLE lsapic entry: 0xf0000705fd7a0238 (XEN) DISABLE lsapic entry: 0xf0000705fd7a0244 (XEN) DISABLE lsapic entry: 0xf0000705fd7a0250 (XEN) DISABLE lsapic entry: 0xf0000705fd7a025c (XEN) DISABLE lsapic entry: 0xf0000705fd7a0268 (XEN) DISABLE lsapic entry: 0xf0000705fd7a0274 (XEN) DISABLE lsapic entry: 0xf0000705fd7a0280 (XEN) DISABLE lsapic entry: 0xf0000705fd7a028c (XEN) DISABLE lsapic entry: 0xf0000705fd7a0298 (XEN) DISABLE lsapic entry: 0xf0000705fd7a02a4 (XEN) DISABLE lsapic entry: 0xf0000705fd7a02b0 (XEN) DISABLE lsapic entry: 0xf0000705fd7a02bc (XEN) DISABLE lsapic entry: 0xf0000705fd7a02c8 (XEN) DISABLE lsapic entry: 0xf0000705fd7a02d4 (XEN) DISABLE lsapic entry: 0xf0000705fd7a02e0 (XEN) DISABLE lsapic entry: 0xf0000705fd7a02ec (XEN) DISABLE lsapic entry: 0xf0000705fd7a02f8 (XEN) DISABLE lsapic entry: 0xf0000705fd7a0304 (XEN) DISABLE lsapic entry: 0xf0000705fd7a0310 (XEN) DISABLE lsapic entry: 0xf0000705fd7a031c (XEN) DISABLE lsapic entry: 0xf0000705fd7a0328 (XEN) DISABLE lsapic entry: 0xf0000705fd7a0334 (XEN) DISABLE lsapic entry: 0xf0000705fd7a0340 (XEN) Success Disabling SRAT (XEN) Success Disabling SLIT (XEN) Domain0 EFI passthrough: ACPI 2.0=0x705fd7a0000 SMBIOS=0x1fffe000 (XEN) Scrubbing Free RAM: .............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................done. (XEN) Xen trace buffers: disabled (XEN) Std. Loglevel: Errors and warnings (XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings) (XEN) Xen is relinquishing VGA console. (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen). (XEN) Linux version 2.6.18-190.el5xen (mockbuild.redhat.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)) #1 SMP Mon Feb 22 19:34:51 EST 2010 (XEN) EFI v1.00 by Xen/ia64: SALsystab=0x4178 ACPI 2.0=0x705fd7a0000 SMBIOS=0x1fffe000 (XEN) booting generic kernel on platform hpzx1 (XEN) rsvd_region[0]: [0xe000000000004228, 0xe000000000004a70) (XEN) rsvd_region[1]: [0xe000000001000000, 0xe000000001000048) (XEN) rsvd_region[2]: [0xe000000004000000, 0xe000000004eaaee0) (XEN) rsvd_region[3]: [0xe000000004eac000, 0xe00000000548e6ed) (XEN) rsvd_region[4]: [0xffffffffffffffff, 0xffffffffffffffff) (XEN) Initial ramdisk at: 0xe000000004eb0000 (6153965 bytes) (XEN) SAL 0.1: Xen/ia64 Xen/ia64 version 0.0 (XEN) SAL: AP wakeup using external interrupt vector 0xf3 (XEN) No logical to physical processor mapping available (XEN) vcpu_set_itc: Setting ar.itc is currently disabled (this message is only displayed once) (XEN) cpu_init: PAL max_purges is overridden to 1 PALO is required for multiple outsanding ptc.g (XEN) ACPI: Local APIC address c0000000fee00000 (XEN) <G><2>mm.c:1164:d0 efi_mmio: physaddr 0xf801001c800 size = 0x4000 (XEN) GSI 16 (level, low) -> CPU 0 (0x0000) vector 49 (XEN) 4 CPUs available, 32 CPUs total (XEN) Running on Xen! start_info_pfn=0x13ab nr_pages=262144 flags=0x3 (XEN) MCA related initialization done (XEN) Virtual mem_map starts at 0xa0007ff9df21c000 (XEN) SMP: Allowing 32 CPUs, 28 hotplug CPUs (XEN) Built 1 zonelists. Total pages: 257754 (XEN) Kernel command line: root=/dev/VolGroup00/LogVol00 rhgb quiet ro (XEN) <2>arch_boot_vcpu: vcpu 1 awaken (XEN) WARN: GSI 23 in use by Xen.
Created attachment 397620 [details] xm dmesg on -190 xm dmesg on succesfull boot of -190.el5xen kernel.
Created attachment 397621 [details] dom dmesg on -190 dom0 dmesg on -190.el5xen kernel
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html
*** Bug 583598 has been marked as a duplicate of this bug. ***
*** Bug 645745 has been marked as a duplicate of this bug. ***