Bug 217770
Summary: | RHEL4u4 x86_64 FV guest with >4GB memory results in guest hang at boot | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Jan Mark Holzer <jmh> |
Component: | xen | Assignee: | Steven Rostedt <srostedt> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 5.0 | CC: | berrange, dshaks, srostedt, xen-maint, yunfeng.zhao |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | 5.0.0 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2007-01-26 20:06:41 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jan Mark Holzer
2006-11-29 20:52:15 UTC
QE ack for RHEL5. May be related to bug 218820 or bug 218822, another 4G limit. The fixes for this issue are among xen unstable 11765~11831. 11830 is necessary to fix this problem, http://xenbits.xensource.com/xen- unstable.hg?cs=a855c7d3a536 Still needs to do more tests to find other fixes. xen unstable 11853 is another necessary fix for this issue: http://xenbits.xensource.com/xen-unstable.hg?cs=c3602d217110 Just some info from yesterday's IRC : Woodie is currently running : Linux woodie.lab.boston.redhat.com 2.6.18-1.2879.el5xen #1 SMP Fri Dec 15 17:54:00 EST 2006 x86_64 x86_64 x86_64 GNU/Linux Looking into xm dmesg after starting out RHEL4U4 HVM guest shows the following : Booted with memory = 2048 (XEN) (GUEST: 2) Memory size 2048 MB Booted with memory = 4096 which also results in a hard hang when starting the network (ifup eth0) (XEN) (GUEST: 3) Memory size 3840 MB The i386 variant of the RHEL4U4 HVM guest works fine (tested up to 8192) Also tried the RHEL4.5 PV kernel and as expected it worked : [root@dhcp78-70 ~]# free -m total used free shared buffers cached Mem: 8192 234 7957 0 4 25 -/+ buffers/cache: 203 7988 Swap: 509 0 509 [root@dhcp78-70 ~]# grep Mem /proc/meminfo MemTotal: 8388608 kB MemFree: 8148976 kB Some additional info from inside the RHEL4U4 guest : Also noticed the following entries in /var/log/messages when I start the network manually in the RHEL4U4 HVM guest if it's configured with 4096 memory (ie it will hang when bringing eth0 up (manual ifup eth0)). If I start the guest with 2048 none of the skbuff messages are logged and the guest works just fine. Dec 15 19:33:53 woodie avahi-daemon[3858]: Registering new address record for fe80::70c2:56ff:fe29:db59 on tap0. Dec 15 19:38:12 woodie kernel: Attempt to allocate order 5 skbuff. Increase MAX_SKBUFF_ORDER. Dec 15 19:38:43 woodie last message repeated 5613 times Dec 15 19:39:21 woodie last message repeated 6940 times Dec 15 19:39:22 woodie kernel: xenbr0: port 4(vif7.0) entering disabled state note the skbuff message Some addtl information requested by Rik such as the e820 map for a boot with 2048MB Linux version 2.6.9-42.ELsmp (bhcompile.redhat.com) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-2)) #1 SMP Wed Jul 12 23:32:02 EDT 2006 BIOS-e820: 0000000000000000 - 000000000009f000 (usable) BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved) BIOS-e820: 00000000000a0000 - 00000000000c0000 type 16 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000007fff0000 (usable) BIOS-e820: 000000007fff0000 - 000000007fffa000 (ACPI data) BIOS-e820: 000000007fffa000 - 000000007fffd000 (ACPI NVS) BIOS-e820: 000000007fffd000 - 000000007fffe000 type 19 BIOS-e820: 000000007fffe000 - 000000007ffff000 type 18 BIOS-e820: 000000007ffff000 - 0000000080000000 type 17 BIOS-e820: 00000000fec00000 - 0000000100000000 type 16 and now for a boot with 4096MB Linux version 2.6.9-42.ELsmp (bhcompile.redhat.com) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-2)) #1 SMP Wed Jul 12 23:32:02 EDT 2006 BIOS-e820: 0000000000000000 - 000000000009f000 (usable) BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved) BIOS-e820: 00000000000a0000 - 00000000000c0000 type 16 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 00000000efff0000 (usable) BIOS-e820: 00000000efff0000 - 00000000efffa000 (ACPI data) BIOS-e820: 00000000efffa000 - 00000000efffd000 (ACPI NVS) BIOS-e820: 00000000efffd000 - 00000000efffe000 type 19 BIOS-e820: 00000000efffe000 - 00000000effff000 type 18 BIOS-e820: 00000000effff000 - 00000000f0000000 type 17 BIOS-e820: 00000000fec00000 - 0000000100000000 type 16 BIOS-e820: 0000000100000000 - 000000010a100000 (usable) Running RHEL5 Beta 2 FV crashes almost immediately if you make the memory greater than 3840Megs. It doesn't seem that the code in Xen (max_pages) is taking into account all the holes made by IO. When I set memory to 3841 I get this mapping: BIOS-e820: 0000000000000000 - 000000000009f000 (usable) BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved) BIOS-e820: 00000000000a0000 - 00000000000c0000 type 16 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 00000000efff0000 (usable) BIOS-e820: 00000000efff0000 - 00000000efffa000 (ACPI data) BIOS-e820: 00000000efffa000 - 00000000efffd000 (ACPI NVS) BIOS-e820: 00000000efffd000 - 00000000efffe000 type 19 BIOS-e820: 00000000efffe000 - 00000000effff000 type 18 BIOS-e820: 00000000effff000 - 00000000f0000000 type 17 BIOS-e820: 00000000fec00000 - 0000000100000000 type 16 BIOS-e820: 0000000100000000 - 0000000100100000 (usable) And when it tries to map a page into that 0x100000000 - 0x100100000 range we get a crash. Here's the debug I put into xen: in xen/arch/x86/hvm/vmx/vmx.c: HVM_DBG_LOG(DBG_LEVEL_VMMU, "CR3 value = %lx", value); if ( ((value >> PAGE_SHIFT) > v->domain->max_pages ) || !VALID_MFN(mfn = get_mfn_from_gpfn(value >> PAGE_SHIFT)) || !get_page(mfn_to_page(mfn), v->domain) ) { printk ("value>>PAGE_SHIFT=%lx\n", value >> PAGE_SHIFT); printk ("max_pages=%x\n",v->domain->max_pages); printk ("valid=%d\n",VALID_MFN(mfn=get_mfn_from_gpfn(value >> PAGE_SHIFT))); printk("Invalid CR3 value=%lx\n", value); domain_crash_synchronous(); /* need to take a clean path */ And here's the output: (XEN) value>>PAGE_SHIFT=1000d7 (XEN) max_pages=f1e01 (XEN) valid=1 (XEN) Invalid CR3 value=1000d7000 (XEN) domain_crash_sync called from vmx.c:1684 The f1e01<<12 >>20 is 3870, which is bigger than the needed 3841, but it seems that this doesn't take into account all the io that is broken between the ram. So the max_page is not set to what we really need. Also I put in a debug statement is domctl: if ( new_max >= d->tot_pages ) { printk("making new max pages %lx\n",new_max); d->max_pages = new_max; Which gave me: (XEN) making new max pages f1e01 So we need to also find out who is setting this, and figure out exactly whats going on. Also note: In vmxassist tools: if (memory_size > 0x3bc000) memory_size = 0x3bc000; memory_size = (memory_size << 10) + 0xF00000; if (memory_size <= 0xF00000) memory_size = (((get_cmos(0x31) << 8) | get_cmos(0x30)) + 0x400) << 10; memory_size += 0x400 << 10; /* + 1MB */ Where you will find ((0x3bc000<<10) + 0xf00000 + (0x400 << 10))>>20 = 0xf00 = 3840. Which just so happens to be the breaking point of our code! I tested out xen-unstable and it can't boot the HVM kernel what-so-ever. I then tested xen-3.0.4-testing, and it can. Not only that, by using 3.0.4-testing kernel (2.6.16.33-xen) and HV and tools, I was able to boot a > 4G HVM. So it's time to do some patch hunting (in the hg logs of xen-testing). So far I've found 13061:6cbed96fedac: summary: Clean-up hvm/shadow interaction around cr3 updates. and 12759:67a06a9b7b1d summary: [HVM] qemu: Add guest address-space mapping cache. When I get back from the holidays, I'll look more into these. Upstream changeset 11853 fixes the problem. Patch posted for ACK. Built into 2.6.18-1.3002.el5. 2.6.18-7.el5 included in 20070125.0. |