Bug 217770

Summary:	RHEL4u4 x86_64 FV guest with >4GB memory results in guest hang at boot
Product:	Red Hat Enterprise Linux 5	Reporter:	Jan Mark Holzer <jmh>
Component:	xen	Assignee:	Steven Rostedt <srostedt>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	medium	Docs Contact:
Priority:	medium
Version:	5.0	CC:	berrange, dshaks, srostedt, xen-maint, yunfeng.zhao
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	5.0.0	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2007-01-26 20:06:41 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jan Mark Holzer 2006-11-29 20:52:15 UTC

Description of problem:

Booting a RHEL4u4 x86_64 fully virt guest with more than 4020MB of memory will
result in a hang of the guest .
Booting the same FV guest with a memory config smaller than 4020MB will boot
just fine. 
Also a RHEL4u4 i386 guest will boot fine with more than 4GB of memory (tried up
to 8GB) , also the x86_64 RHEL4u4 PV guest works fine.


The x86_64 guest will hang at starting the network (configuring ethX) and stop
showing any progress (the rotating icon on the RHGB boot will stop as well).
 

Version-Release number of selected component (if applicable):

RHEL5 Beta2 for the hypervisor
Linux woodie.lab.boston.redhat.com 2.6.18-1.2747.el5xen #1 SMP Thu Nov 9
19:08:55 EST 2006 x86_64 x86_64 x86_64 GNU/Linux

guest config is RHEL4u4 x86_64 (tried all 3 kernel variants (EL,SMP and large
 SMP) all show the same problem.


How reproducible:


Steps to Reproduce:
1. configure a RHEL4u4 FV x86_64 guest with >4GB of memory
2. boot the guest
3. guest will hang (hard) at boot
  
Actual results:

Guest will consistently hang upon every boot

Expected results:

Able to configure >4GB in a x86_64 FV guest


Additional info:

This problem is consistently reproducible on woodie.lab.boston.redhat.com
For access please contact me

Can provide additional data/logs or run tests as needed

Comment 1 Jay Turner 2006-12-01 14:27:35 UTC

QE ack for RHEL5.

Comment 2 Stephen Tweedie 2006-12-08 22:58:21 UTC

May be related to bug 218820

Comment 3 Stephen Tweedie 2006-12-08 22:59:47 UTC

or bug 218822, another 4G limit.

Comment 5 Zhao Yunfeng 2006-12-14 14:48:52 UTC

The fixes for this issue are among xen unstable 11765~11831.

11830 is necessary to fix this problem, http://xenbits.xensource.com/xen-
unstable.hg?cs=a855c7d3a536
Still needs to do more tests to find other fixes.

Comment 6 Zhao Yunfeng 2006-12-15 11:51:23 UTC

xen unstable 11853 is another necessary fix for this issue:
http://xenbits.xensource.com/xen-unstable.hg?cs=c3602d217110

Comment 9 Jan Mark Holzer 2006-12-16 13:21:11 UTC

Just some info from yesterday's IRC :

Woodie is currently running :

 Linux woodie.lab.boston.redhat.com 2.6.18-1.2879.el5xen #1 SMP Fri Dec 15
17:54:00 EST 2006 x86_64 x86_64 x86_64 GNU/Linux

Looking into xm dmesg after starting out RHEL4U4 HVM guest shows the following :
 Booted with memory = 2048
 (XEN) (GUEST: 2) Memory size 2048 MB

  Booted with memory = 4096 which also results in a hard hang when starting the
  network (ifup eth0)
 (XEN) (GUEST: 3) Memory size 3840 MB
 
  The i386 variant of the RHEL4U4 HVM guest works fine (tested up to 8192)
  
  Also tried the RHEL4.5 PV kernel and as expected it worked :
 
 [root@dhcp78-70 ~]# free -m
              total       used       free     shared    buffers     cached
 Mem:          8192        234       7957          0          4         25
 -/+ buffers/cache:        203       7988
 Swap:          509          0        509
 [root@dhcp78-70 ~]# grep Mem /proc/meminfo 
 MemTotal:      8388608 kB
 MemFree:       8148976 kB

  Some additional info from inside the RHEL4U4 guest :

  Also noticed the following entries in /var/log/messages 
  when I start the network manually in the RHEL4U4 HVM guest
  if it's configured with 4096 memory (ie it will hang when
   bringing eth0 up (manual ifup eth0)). If I start the guest
   with 2048 none of the skbuff messages are logged and the guest
   works just fine.

Dec 15 19:33:53 woodie avahi-daemon[3858]: Registering new address record for
fe80::70c2:56ff:fe29:db59 on tap0.
 Dec 15 19:38:12 woodie kernel: Attempt to allocate order 5 skbuff. Increase
MAX_SKBUFF_ORDER.
 Dec 15 19:38:43 woodie last message repeated 5613 times
 Dec 15 19:39:21 woodie last message repeated 6940 times
 Dec 15 19:39:22 woodie kernel: xenbr0: port 4(vif7.0) entering disabled state
 note the skbuff message

 Some addtl information requested by Rik such as the e820 map for a 
 boot with 2048MB

 Linux version 2.6.9-42.ELsmp (bhcompile.redhat.com) (gcc
version 3.4.6 20060404 (Red 
 Hat 3.4.6-2)) #1 SMP Wed Jul 12 23:32:02 EDT 2006
  BIOS-e820: 0000000000000000 - 000000000009f000 (usable)
  BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved)
  BIOS-e820: 00000000000a0000 - 00000000000c0000 type 16
  BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
  BIOS-e820: 0000000000100000 - 000000007fff0000 (usable)
  BIOS-e820: 000000007fff0000 - 000000007fffa000 (ACPI data)
  BIOS-e820: 000000007fffa000 - 000000007fffd000 (ACPI NVS)
  BIOS-e820: 000000007fffd000 - 000000007fffe000 type 19
  BIOS-e820: 000000007fffe000 - 000000007ffff000 type 18
  BIOS-e820: 000000007ffff000 - 0000000080000000 type 17
  BIOS-e820: 00000000fec00000 - 0000000100000000 type 16

 and now for a boot with 4096MB 

 Linux version 2.6.9-42.ELsmp (bhcompile.redhat.com) (gcc
version 3.4.6 20060404 (Red Hat 3.4.6-2)) #1 SMP Wed Jul 12 23:32:02 EDT 2006
  BIOS-e820: 0000000000000000 - 000000000009f000 (usable)
  BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved)
  BIOS-e820: 00000000000a0000 - 00000000000c0000 type 16
  BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
  BIOS-e820: 0000000000100000 - 00000000efff0000 (usable)
  BIOS-e820: 00000000efff0000 - 00000000efffa000 (ACPI data)
  BIOS-e820: 00000000efffa000 - 00000000efffd000 (ACPI NVS)
  BIOS-e820: 00000000efffd000 - 00000000efffe000 type 19
  BIOS-e820: 00000000efffe000 - 00000000effff000 type 18
  BIOS-e820: 00000000effff000 - 00000000f0000000 type 17
  BIOS-e820: 00000000fec00000 - 0000000100000000 type 16
  BIOS-e820: 0000000100000000 - 000000010a100000 (usable)

Comment 10 Steven Rostedt 2006-12-21 05:03:26 UTC

Running RHEL5 Beta 2 FV crashes almost immediately if you make the memory
greater than 3840Megs.

It doesn't seem that the code in Xen (max_pages) is taking into account all the
holes made by IO. When I set memory to 3841 I get this mapping:

 BIOS-e820: 0000000000000000 - 000000000009f000 (usable)
 BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000a0000 - 00000000000c0000 type 16
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 00000000efff0000 (usable)
 BIOS-e820: 00000000efff0000 - 00000000efffa000 (ACPI data)
 BIOS-e820: 00000000efffa000 - 00000000efffd000 (ACPI NVS)
 BIOS-e820: 00000000efffd000 - 00000000efffe000 type 19
 BIOS-e820: 00000000efffe000 - 00000000effff000 type 18
 BIOS-e820: 00000000effff000 - 00000000f0000000 type 17
 BIOS-e820: 00000000fec00000 - 0000000100000000 type 16
 BIOS-e820: 0000000100000000 - 0000000100100000 (usable)

And when it tries to map a page into that 0x100000000 - 0x100100000 range we get
a crash.  Here's the debug I put into xen:

in xen/arch/x86/hvm/vmx/vmx.c:

            HVM_DBG_LOG(DBG_LEVEL_VMMU, "CR3 value = %lx", value);
            if ( ((value >> PAGE_SHIFT) > v->domain->max_pages ) ||
                 !VALID_MFN(mfn = get_mfn_from_gpfn(value >> PAGE_SHIFT)) ||
                 !get_page(mfn_to_page(mfn), v->domain) )
            {
                printk ("value>>PAGE_SHIFT=%lx\n", value >> PAGE_SHIFT);
                printk ("max_pages=%x\n",v->domain->max_pages);
                printk ("valid=%d\n",VALID_MFN(mfn=get_mfn_from_gpfn(value >>
PAGE_SHIFT)));
                printk("Invalid CR3 value=%lx\n", value);
                domain_crash_synchronous(); /* need to take a clean path */

And here's the output:

(XEN) value>>PAGE_SHIFT=1000d7
(XEN) max_pages=f1e01
(XEN) valid=1
(XEN) Invalid CR3 value=1000d7000
(XEN) domain_crash_sync called from vmx.c:1684

The f1e01<<12 >>20 is 3870, which is bigger than the needed 3841, but it seems
that this doesn't take into account all the io that is broken between the ram.
So the max_page is not set to what we really need.

Also I put in a debug statement is domctl:

        if ( new_max >= d->tot_pages )
        {
                printk("making new max pages %lx\n",new_max);
            d->max_pages = new_max;

Which gave me:

(XEN) making new max pages f1e01

So we need to also find out who is setting this, and figure out exactly whats
going on.

Also note:

In vmxassist tools:

	if (memory_size > 0x3bc000)
		memory_size = 0x3bc000;
	memory_size = (memory_size << 10) + 0xF00000;
	if (memory_size <= 0xF00000)
		memory_size =
		    (((get_cmos(0x31) << 8) | get_cmos(0x30)) + 0x400) << 10;
	memory_size += 0x400 << 10; /* + 1MB */

Where you will find ((0x3bc000<<10) + 0xf00000 + (0x400 << 10))>>20 = 0xf00 =
3840. Which just so happens to be the breaking point of our code!

Comment 11 Steven Rostedt 2006-12-22 05:07:46 UTC

I tested out xen-unstable and it can't boot the HVM kernel what-so-ever.

I then tested xen-3.0.4-testing, and it can. Not only that, by using
3.0.4-testing kernel (2.6.16.33-xen) and HV and tools, I was able to boot a > 4G
HVM. So it's time to do some patch hunting (in the hg logs of xen-testing).

So far I've found 13061:6cbed96fedac:
summary:     Clean-up hvm/shadow interaction around cr3 updates.

and 12759:67a06a9b7b1d
summary:     [HVM] qemu: Add guest address-space mapping cache.

When I get back from the holidays, I'll look more into these.

Comment 13 Steven Rostedt 2007-01-08 21:05:26 UTC

Upstream changeset 11853 fixes the problem.  Patch posted for ACK.

Comment 14 Jay Turner 2007-01-10 15:30:38 UTC

Built into 2.6.18-1.3002.el5.

Comment 15 Jay Turner 2007-01-26 20:06:41 UTC

2.6.18-7.el5 included in 20070125.0.