Bug 540811 - [RHEL5 Xen]: PV guest crash on poweroff
Summary: [RHEL5 Xen]: PV guest crash on poweroff
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen
Version: 5.6
Hardware: All
OS: Linux
Target Milestone: rc
: ---
Assignee: Chris Lalancette
QA Contact: Red Hat Kernel QE team
Depends On:
Blocks: 526946
TreeView+ depends on / blocked
Reported: 2009-11-24 08:45 UTC by Chris Lalancette
Modified: 2010-04-08 16:21 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 541538 (view as bug list)
Last Closed: 2010-03-30 07:40:50 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0178 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.5 kernel security and bug fix update 2010-03-29 12:18:21 UTC

Description Chris Lalancette 2009-11-24 08:45:20 UTC
Description of problem:
I was doing some save/restore testing of RHEL PV guests.  My dom0 is an AMD RevF with 2 dual-core processors, 2G of memory, running kernel 2.6.18-164.el5xen and xen-3.0.3-94.el5.  My guest is a 32-bit RHEL-5 PV guest running 2.6.18-164.el5xen, 4 vcpus, mem=512, maxmem=1500.  Here are the steps I performed:

1)  Boot up the PV guest (with 512M of memory)
2)  xm save the PV guest
3)  xm restore the PV guest
4)  xm mem-set <guest> 1500
5)  xm save the PV guest
6)  xm restore the PV guest
7)  poweroff inside the PV guests

On step 7), I got a crash that looks like:

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000010
 printing eip:
1e818000 -> *pde = 00000000:68800027
1f045000 -> *pme = 00000000:46822067
13a83000 -> *pte = 00000000:00000000
Oops: 0000 [#1]
last sysfs file: /class/misc/autofs/dev
Modules linked in: autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 xfrm_nalgo crypto_api dm_multipath scsi_dh scsi_mod parport_pc lp parport xennet pcspkr dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod xenblk ext3 jbd uhci_hcd ohci_hcd ehci_hcd
CPU:    0
EIP:    0061:[<c0452d86>]    Not tainted VLI
EFLAGS: 00010206   (2.6.18-164.el5xen #1) 
EIP is at mempool_alloc+0x22/0xc9
eax: 00011200   ebx: 00000010   ecx: df87fba0   edx: ddf79000
esi: 00000000   edi: 00000020   ebp: 00011210   esp: ddf79af8
ds: 007b   es: 007b   ss: 0069
Process poweroff (pid: 2191, ti=ddf79000 task=dfe5f550 task.ti=ddf79000)
Stack: df87c740 c04754fa 00000010 10000000 df87c9c0 c07ec740 df87fba0 00000020 
       c07ec64c c045a093 ddf79b74 df87fba0 00000000 00000000 df87c740 00000000 
       00000000 00000000 00000000 df87fba0 00000020 df87c9c0 c04dbea2 00001000 
Call Trace:
 [<c04754fa>] bio_alloc_bioset+0x9b/0xf3
 [<c045a093>] blk_queue_bounce+0xd5/0x258
 [<c04dbea2>] __make_request+0x44/0x348
 [<c04da2bb>] generic_make_request+0x255/0x265
 [<ee260442>] __map_bio+0x44/0x103 [dm_mod]
 [<ee260e2e>] __split_bio+0x17e/0x438 [dm_mod]
 [<c047635c>] bio_add_page+0x25/0x2d
 [<ee26179b>] dm_request+0xdb/0xe8 [dm_mod]
 [<c04da2bb>] generic_make_request+0x255/0x265
 [<c046c958>] kmem_cache_alloc+0x54/0x5e
 [<c04e7ef0>] radix_tree_node_alloc+0x16/0x50
 [<c04dc421>] submit_bio+0xd5/0xdd
 [<c04502d5>] add_to_page_cache+0x9e/0xa6
 [<c0490340>] mpage_end_io_read+0x0/0x66
 [<c048f64d>] mpage_bio_submit+0x19/0x1d
 [<c04907bd>] mpage_readpages+0xa0/0xaa
 [<c04548dc>] __alloc_pages+0x57/0x297
 [<ee274a7b>] ext3_readpages+0x0/0x15 [ext3]
 [<c0455df0>] __do_page_cache_readahead+0x125/0x18b
 [<ee2755fa>] ext3_get_block+0x0/0xd6 [ext3]
 [<c0455e9c>] blockable_page_cache_readahead+0x46/0x99
 [<c045602f>] page_cache_readahead+0xb3/0x178
 [<c04507d8>] do_generic_mapping_read+0xb8/0x37b
 [<c0451304>] __generic_file_aio_read+0x16a/0x1a3
 [<c044fdd1>] file_read_actor+0x0/0xd5
 [<c0451378>] generic_file_aio_read+0x3b/0x42
 [<c047029f>] do_sync_read+0xb6/0xf1
 [<c042fef7>] autoremove_wake_function+0x0/0x2d
 [<c04701e9>] do_sync_read+0x0/0xf1
 [<c0470b78>] vfs_read+0x9f/0x141
 [<c04796db>] kernel_read+0x32/0x43
 [<c04797b3>] prepare_binprm+0xc7/0xcc
 [<c047b251>] do_execve+0xc3/0x1b2
 [<c040337d>] sys_execve+0x2a/0x4a
 [<c0405413>] syscall_call+0x7/0xb
Code: 89 f8 ff 53 18 5b 5e 5f c3 55 57 56 89 c6 53 89 d3 83 ec 14 f6 c2 10 74 05 e8 1e 4b 1c 00 89 dd 81 cd 00 12 01 00 89 e8 83 e0 af <8b> 56 10 ff 56 14 85 c0 89 c3 0f 85 8d 00 00 00 89 f0 e8 db 61 
EIP: [<c0452d86>] mempool_alloc+0x22/0xc9 SS:ESP 0069:ddf79af8
 <0>Kernel panic - not syncing: Fatal exception

Comment 1 Chris Lalancette 2009-11-24 08:59:50 UTC
Additional notes:

Using the steps above and the -164 kernel inside the guest, it seems to be reproducible.  Using the -128 kernel inside the guest, it is *not* reproducible, so it's a regression between 5.3 and 5.4.

Chris Lalancette

Comment 4 Chris Lalancette 2009-11-24 17:19:16 UTC
Bizarrely, after bisecting this, it's this commit that causes a problem:

commit 911d74df73a60067a0d4f31f364e521077a8854c
Author: Chris Lalancette <clalance>
Date:   Thu Mar 5 14:13:05 2009 +0100

    [xen] xen reports bogus LowTotal
    Message-id: 49AFCFE1.9050501
    O-Subject: [RHEL5.4 PATCH]: Xen reports bogus LowTotal
    Bugzilla: 428892
    RH-Acked-by: Don Dutile <ddutile>
    RH-Acked-by: Rik van Riel <riel>
         The xen kernel can report a LowTotal of 4Tb on a system, even though th
    system only has 3.5Gb of memory.  That's obviously totally bogus.  The probl
    is that the balloon driver wasn't properly accounting for totalhigh_pages in
    it's calculations, which screws up the rest of the reporting in the system.
    This is a straightforward backport of linux-2.6.18-xen.hg c/s 79 and 128, an
    seems to fix the problem for the reporter.
          This will fix BZ 428892.  Please review and ACK
    Chris Lalancette

diff --git a/drivers/xen/balloon/balloon.c b/drivers/xen/balloon/balloon.c
index 39d7185..e8ce44f 100644
--- a/drivers/xen/balloon/balloon.c
+++ b/drivers/xen/balloon/balloon.c
@@ -93,6 +93,15 @@ static unsigned long frame_list[PAGE_SIZE / sizeof(unsigned l
 /* VM /proc information for memory */
 extern unsigned long totalram_pages;
+#ifndef MODULE
+extern unsigned long totalhigh_pages;
+#define inc_totalhigh_pages() (totalhigh_pages++)
+#define dec_totalhigh_pages() (totalhigh_pages--)
+#define inc_totalhigh_pages() ((void)0)
+#define dec_totalhigh_pages() ((void)0)
 /* We may hit the hard limit in Xen. If we do then we remember it. */
 static unsigned long hard_limit;
@@ -137,6 +146,7 @@ static void balloon_append(struct page *page)
        if (PageHighMem(page)) {
                list_add_tail(PAGE_TO_LIST(page), &ballooned_pages);
+               dec_totalhigh_pages();
        } else {
                list_add(PAGE_TO_LIST(page), &ballooned_pages);
@@ -154,8 +164,10 @@ static struct page *balloon_retrieve(void)
        page = LIST_TO_PAGE(ballooned_pages.next);
-       if (PageHighMem(page))
+       if (PageHighMem(page)) {
+               inc_totalhigh_pages();
+       }

Reverting that commit, and only that commit, makes the problem go away.  However, the crash really doesn't have anything directly to do with the totalhigh_pages.  My analysis of the crash so far is:

mm/mempool.c:mempool_alloc() crashes at line 220, accessing 00000010.  That means at that line, pool is NULL, and it's trying to access NULL->pool_data.  Going back further in the stack, mempool_alloc() is being called from mm/highmem.c:__blk_queue_bounce(), line 409.  The NULL pool is just passed into there.  mm/highmem.c:blk_queue_bounce() is the one that actually figures out the pool.  However, this is quite strange; the pool is set to one of two static pools, either isa_page_pool or page_pool.  There is a BUG(!isa_page_pool), so we are probably not going through that path.  However, page_pool should *never* be NULL; it's was initialized early on during boot, and is never changed.  So this leads us to one of two things; either that is being initialized, and is later being clobbered (memory corruption), or we should never come into that path with Xen (which I'm just not sure about).

Chris Lalancette

Comment 5 Chris Lalancette 2009-11-24 20:24:40 UTC
Got it.  We are missing upstream linux-2.6.18-xen.hg c/s 148:

# HG changeset patch
# User Ian Campbell <ian.campbell>
# Date 1185543936 -3600
# Node ID 667228bf8fc5f1a21719e11c7eb269d0188a2d60
# Parent  88a17da7f3362126182423100a9d7d4c0d854139
BLKFRONT: Make sure we don't use bounce buffers, we don't need them.

Signed-off-by: Ian Campbell <ian.campbell>

diff -r 88a17da7f336 -r 667228bf8fc5 drivers/xen/blkfront/vbd.c
--- a/drivers/xen/blkfront/vbd.c	Thu Jul 26 16:36:52 2007 +0100
+++ b/drivers/xen/blkfront/vbd.c	Fri Jul 27 14:45:36 2007 +0100
@@ -213,6 +213,9 @@
 	/* Make sure buffer addresses are sector-aligned. */
 	blk_queue_dma_alignment(rq, 511);
+	/* Make sure we don't use bounce buffers. */
+	blk_queue_bounce_limit(rq, BLK_BOUNCE_ANY);
 	gd->queue = rq;
 	return 0;

With this in place, my reproducer in the summary works just fine.  I'll get this ready for inclusion.

Chris Lalancette

Comment 6 Don Zickus 2009-12-04 19:00:53 UTC
in kernel-2.6.18-177.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 12 errata-xmlrpc 2010-03-30 07:40:50 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.