Bug 716498

Summary: bump domain memory limits
Product: Red Hat Enterprise Linux 6 Reporter: Andrew Jones <drjones>
Component: kernelAssignee: Igor Mammedov <imammedo>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: low    
Version: 6.2CC: arozansk, imammedo, james.brown, jgreguske, leiwang, mjenner, pbonzini, pcao, qguan, qwan, sghosh, shwang, xen-maint, yuzhou
Target Milestone: rcKeywords: EC2
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-2.6.32-176.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 669739 Environment:
Last Closed: 2011-12-06 13:43:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 669739, 743590    
Bug Blocks: 653816, 716911    
Attachments:
Description Flags
[1/15] x86: add RESERVE_BRK_ARRAY() helper
none
[2/15] xen: dynamically allocate p2m space
none
[3/15] xen: allocate p2m size based on actual max size
none
[4/15] xen: set shared_info->arch.max_pfn to max_p2m_pfn
none
[5/15] xen: make install_p2mtop_page() static
none
[6/15] xen: convert p2m to a 3 level tree
none
[7/15] xen: use early_brk for level2_kernel_pgt
none
[8/15] xen: allocate level1_ident_pgt
none
[9/15] xen: add return value to set_phys_to_machine()
none
[10/15] xen: defer building p2m mfn structures until kernel is mapped
none
[11/15] xen: don't map missing memory
none
[12/15] xen: correctly rebuild mfn list list after migration.
none
[13/15] xen: annotate functions which only call into __init at start of day
none
[14/15] xen: bump memory limit for x86 domU PV guest to 128Gb
none
[15/15] Unset CONFIG_DEBUG_FORCE_WEAK_PER_CPU on x86/x86_64 platforms
none
[RHEL6.2 Xen PATCH 16/17] xen: correct size of level2_kernel_pgt
none
[17/17] x86-64: Only set max_pfn_mapped to 512
none
64-bit guest crash backtrace at early boot stage and dmesg
none
xm dmesg related to crash none

Comment 2 RHEL Program Management 2011-06-24 16:39:51 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 3 Igor Mammedov 2011-06-30 13:44:39 UTC
Created attachment 510668 [details]
[1/15] x86: add RESERVE_BRK_ARRAY() helper

Comment 4 Igor Mammedov 2011-06-30 13:45:13 UTC
Created attachment 510669 [details]
[2/15] xen: dynamically allocate p2m space

Comment 5 Igor Mammedov 2011-06-30 13:45:41 UTC
Created attachment 510670 [details]
[3/15] xen: allocate p2m size based on actual max size

Comment 6 Igor Mammedov 2011-06-30 13:46:10 UTC
Created attachment 510671 [details]
[4/15] xen: set shared_info->arch.max_pfn to max_p2m_pfn

Comment 7 Igor Mammedov 2011-06-30 13:46:34 UTC
Created attachment 510672 [details]
[5/15] xen: make install_p2mtop_page() static

Comment 8 Igor Mammedov 2011-06-30 13:47:08 UTC
Created attachment 510673 [details]
[6/15] xen: convert p2m to a 3 level tree

Comment 9 Igor Mammedov 2011-06-30 13:47:45 UTC
Created attachment 510674 [details]
[7/15] xen: use early_brk for level2_kernel_pgt

Comment 10 Igor Mammedov 2011-06-30 13:48:18 UTC
Created attachment 510675 [details]
[8/15] xen: allocate level1_ident_pgt

Comment 11 Igor Mammedov 2011-06-30 13:48:42 UTC
Created attachment 510676 [details]
[9/15] xen: add return value to set_phys_to_machine()

Comment 12 Igor Mammedov 2011-06-30 13:49:11 UTC
Created attachment 510677 [details]
[10/15] xen: defer building p2m mfn structures until kernel is mapped

Comment 13 Igor Mammedov 2011-06-30 13:49:46 UTC
Created attachment 510678 [details]
[11/15] xen: don't map missing memory

Comment 14 Igor Mammedov 2011-06-30 13:50:17 UTC
Created attachment 510679 [details]
[12/15] xen: correctly rebuild mfn list list after migration.

Comment 15 Igor Mammedov 2011-06-30 13:50:43 UTC
Created attachment 510680 [details]
[13/15] xen: annotate functions which only call into __init at start of day

Comment 16 Igor Mammedov 2011-06-30 13:51:13 UTC
Created attachment 510681 [details]
[14/15] xen: bump memory limit for x86 domU PV guest to 128Gb

Comment 17 Igor Mammedov 2011-06-30 13:51:37 UTC
Created attachment 510682 [details]
[15/15] Unset CONFIG_DEBUG_FORCE_WEAK_PER_CPU on x86/x86_64 platforms

Comment 18 Igor Mammedov 2011-06-30 14:00:19 UTC
Patch set description:

Brew: http://brewweb.devel.redhat.com/brew/taskinfo?taskID=3445480

Boot tested on pq2-0.rhts.eng.bos.redhat.com
  x86 guest: memory sizes 256M/512M/1024M/16G/32G
      guest sees maximum only 16G because of commit
      41b23f52 [mm] Limit 32-bit x86 systems to 16GB and prevent
                    panic on boot when system has more than ~30GB
  x86_64 guest: memory sizes 256M/512M/1024M/16G/32G/64G/128G

Pull from upstream conversion of p2m to a 3 level tree, so that
it could cover the full possible physical space. That will allow
to domU to use upto 128Gb of RAM and don't waste ram when runnig
on small RAM configs.

In addition boot tested hvm guests in the same memory ranges just in case, no problem has been observed.

Comment 19 Paolo Bonzini 2011-06-30 16:14:22 UTC
Since all patches were included here too, including mailing list review for posterity.

Patch 7 needs to have commit a2d771c0 (xen: correct size of level2_kernel_pgt, 2010-10-29) squashed in.  Also, please include in the backport commit 33a8475 (xen: defer building p2m mfn structures until kernel is mapped, 2010-08-27) and probably also commit 67e87f0 (x86-64: Only set max_pfn_mapped to 512 MiB if we enter via head_64.S, 2010-10-13).

Comment 20 Igor Mammedov 2011-07-01 08:09:20 UTC
Paolo,

commit 33a8475 (xen: defer building p2m mfn structures until kernel is mapped
 is already included it's patch 10.

I'll add other patches you pointed out and retest/resend.

Comment 21 Igor Mammedov 2011-07-04 09:46:57 UTC
Created attachment 511155 [details]
[RHEL6.2 Xen PATCH 16/17] xen: correct size of level2_kernel_pgt

Comment 22 Igor Mammedov 2011-07-04 09:47:58 UTC
Created attachment 511156 [details]
[17/17] x86-64: Only set max_pfn_mapped to 512

Comment 23 Aristeu Rozanski 2011-07-12 15:29:38 UTC
Patch(es) available on kernel-2.6.32-167.el6

Comment 26 Igor Mammedov 2011-07-25 18:27:41 UTC
Previously I've tested booting x86_64 pv guest only from image using pygrub bootloader.And using pygrub bootloader guest boots fine.

However, this patch-set causes regression when booting x86_64 pv guest with following xm config
-------------
maxmem = 4096
memory = 512
vcpus = 1
kernel = "mp2mkernel"
extra = "ignore_loglevel console=hvc0 "
on_poweroff = "destroy"
on_reboot = "restart"
on_crash = "preserve"
------------


Problem started to appear with usage of extend_brk in this patch-set.
In RHEL6 there is another commit that fixes guest boot failure by removing
usage of extend_brk 2d5f59f50.

Bisecting upstream doesn't help, a crash magically disappears on with unrelated commit. Failing address is always the same however after adding extra debugging code failing address changes to another constant. Looks like crash point depends on the kernel size.

So far no know fix. Thus reopening this bug.

Comment 27 Igor Mammedov 2011-07-25 18:31:16 UTC
Created attachment 515122 [details]
64-bit guest crash backtrace at early boot stage and dmesg

Comment 28 Igor Mammedov 2011-07-25 18:36:23 UTC
Created attachment 515124 [details]
xm dmesg related to crash

Comment 29 Igor Mammedov 2011-07-25 18:41:06 UTC
Created bug 725519 for revert.

PS:
 problematic guest kernel version 2.6.32-169

Comment 30 Andrew Jones 2011-07-26 11:16:59 UTC
I've decided to keep

91ceeef [virt] Unset CONFIG_DEBUG_FORCE_WEAK_PER_CPU on x86/x86_64 platforms
8d641fc [virt] xen: bump memory limit for x86 domU PV guest to 128Gb

91ceeef doesn't matter - it's a compile issue. However, keeping 8d641fc means that QA should test that 32-bit PV guests can now have, and function properly with, greater than 8G of memory. I'm going to move this BZ back to ON_QA so QA can do that testing (the verification effort remains the same as it would have been for the original patch series, even though we're reverting most of it). I've opened bug 725714 to use for 6.3 to reconsider the backport of the p2m tree.

Comment 33 Aristeu Rozanski 2011-08-02 13:58:59 UTC
Patch(es) available on kernel-2.6.32-176.el6

Comment 36 Qin Guan 2011-09-30 07:02:17 UTC
Test Fail with kernel-2.6.32-176.el6 or newer.

Guest crash when set guest memory as 128G:

Steps:
1. On RHEL5.7 host with kernel 2.6.18-286.el5xen
2. Start a RHEL6.2 guest with -176 kernel with 70G or 100G memory, no problem found
3. Set memory as 128G and start again, guest crashed

xm dmesg:
------------------
(XEN) traps.c:405:d10 Unhandled invalid opcode fault/trap [#6] in domain 10 on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 10 (vcpu#0) crashed on cpu#91:
(XEN) ----[ Xen-3.1.2-286.el5  x86_64  debug=n  Not tainted ]----
(XEN) CPU:    91
(XEN) RIP:    e033:[<ffffffff81004e96>]
(XEN) RFLAGS: 0000000000000212   CONTEXT: guest
(XEN) rax: ffffffff81c670e0   rbx: 0000000002000001   rcx: 0000000000000000
(XEN) rdx: 0000000000000000   rsi: ffffffffffffffff   rdi: 0000000002000000
(XEN) rbp: ffffffff81a01e38   rsp: ffffffff81a01df8   r8:  0000000000000006
(XEN) r9:  ffffffff81c670d0   r10: 0000000000000000   r11: 0000005d00800000
(XEN) r12: 0000005d00800000   r13: 0000000007d00800   r14: 0000002000000000
(XEN) r15: 0000000000000003   cr0: 000000008005003b   cr4: 00000000000026b0
(XEN) cr3: 000000c355b37000   cr2: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
(XEN) Guest stack trace from rsp=ffffffff81a01df8:
(XEN)    0000000000000000 0000005d00800000 ffffffff81004e96 000000010000e030
(XEN)    0000000000010012 ffffffff81a01e38 000000000000e02b ffffffff81004e39
(XEN)    ffffffff81a01e98 ffffffff81c239e4 ffffffff81a01e68 0000000000000000
(XEN)    ffffffff00000001 ffffffff81c65500 ffffffff81a01e78 ffffffff81c631a0
(XEN)    0000000000000000 ffffffff81a01f80 ffffffffffffffff 0000000000000000
(XEN)    ffffffff81a01eb8 ffffffff81c26a26 ffffffff00000001 ffffffff81c631a0
(XEN)    ffffffff81a01f68 ffffffff81c25115 ffffffff00000010 aba5cf4405dd1a5e
(XEN)    ffffffff81a01ee8 ffffffff81c631a0 0000000000000000 0000000000000000
(XEN)    ffffffffffffffff 0000000000000000 ffffffff81a01f68 ffffffff814eb5d7
(XEN)    0000000000000010 ffffffff81a01f78 ffffffff81a01f38 aba5cf4405dd1a5e
(XEN)    ffffffff81a01f58 ffffffff81c631a0 0000000000000000 0000000000000000
(XEN)    ffffffffffffffff 0000000000000000 ffffffff81a01fa8 ffffffff81c1fc2e
(XEN)    aba5cf4405dd1a5e ffffffff81c2737d 000000000200c7a4 0000000000000000
(XEN)    0000000000000000 0000000000000000 ffffffff81a01fc8 ffffffff81c1f33a
(XEN)    ffffffff81c145a0 ffffffff94ad6000 ffffffff81a01ff8 ffffffff81c23180
(XEN)    800822a11f898171 000106d17c080800 0000000000000000 0000000000000000
(XEN)    0000000000000000 ffffffff84ad3000 ffffffff84ad4000 ffffffff84ad5000
(XEN)    ffffffff84ad6000 ffffffff84ad7000 ffffffff84ad8000 ffffffff84ad9000
(XEN)    ffffffff84ada000 ffffffff84adb000 ffffffff84adc000 ffffffff84add000
(XEN)    ffffffff84ade000 ffffffff84adf000 ffffffff84ae0000 ffffffff84ae1000
(XEN) traps.c:1910:d9 Domain attempted WRMSR 000000000000008b from 00000029:00000000 to 00000000:00000000.
--------------------

PS. Test pass with kernel -175, guest startup ok with 128G memory.

Comment 37 Igor Mammedov 2011-09-30 09:55:23 UTC
(In reply to comment #36)
> Test Fail with kernel-2.6.32-176.el6 or newer.

Do you still have the system where it happens reserved?

Comment 40 Igor Mammedov 2011-10-03 16:10:28 UTC
It hits BUG on at arch/x86/xen/mmu.c:185
xen_memory_setup=>xen_add_extra_mem=>__set_phys_to_machine=>p2m_top_index

   BUG_ON(pfn >= MAX_DOMAIN_PAGES);


tools create e820 map for guest with map.size = guest mem + 8Mb
so for 128Gb it will set guest's e820_map.size = 0x2000800000 (i.e. 128Gb + 8Mb)
which means 0x2000800 pages and xen_start_info->nr_pages = 0x2000000.
As result these extra 8Mb are considered by guest as extra_pages used for ballooning, and guest tries to initialize them.

However MAX_DOMAIN_PAGES is set to 0x2000000 (only 128Gb). As result we hit above mentioned BUG_ON when trying initialize page 0x2000000.

So question is what to do?
Why we maxmem+8Mb being set as guest's e820 map?

Comment 42 Qin Guan 2011-10-31 04:37:51 UTC
Verify this problem with kernel-2.6.32-202.el6 (128G memory guest excluded).

Host Version: 
2.6.18-286.el5xen, x86_64
xen-3.0.3-134.el5

Guest Version:
kernel-2.6.32-202.el6, pv

Test Matrix:

Host(x86_64): 
- AMD, Intel
Guest Arch i386:
- Memory Size: 512M, 4G, 8G
Guest Arch x86_64:
- Memory Size: 1G, 4G, 16G, 32G, 70G, 128G
Guest CPU: 
- UP, SMP

For guest with 128G memory, sanity test pass with kernel-2.6.32-211.el6. Please refer to bug 743590 comment 8.

Comment 43 errata-xmlrpc 2011-12-06 13:43:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2011-1530.html