Description of problem: I was doing some save/restore testing of RHEL PV guests. My dom0 is an AMD RevF with 2 dual-core processors, 2G of memory, running kernel 2.6.18-164.el5xen and xen-3.0.3-94.el5. My guest is a 32-bit RHEL-5 PV guest running 2.6.18-164.el5xen, 4 vcpus, mem=512, maxmem=1500. Here are the steps I performed: 1) Boot up the PV guest (with 512M of memory) 2) xm save the PV guest 3) xm restore the PV guest 4) xm mem-set <guest> 1500 5) xm save the PV guest 6) xm restore the PV guest 7) poweroff inside the PV guests On step 7), I got a crash that looks like: BUG: unable to handle kernel NULL pointer dereference at virtual address 00000010 printing eip: c0452d86 1e818000 -> *pde = 00000000:68800027 1f045000 -> *pme = 00000000:46822067 13a83000 -> *pte = 00000000:00000000 Oops: 0000 [#1] SMP last sysfs file: /class/misc/autofs/dev Modules linked in: autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 xfrm_nalgo crypto_api dm_multipath scsi_dh scsi_mod parport_pc lp parport xennet pcspkr dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod xenblk ext3 jbd uhci_hcd ohci_hcd ehci_hcd CPU: 0 EIP: 0061:[<c0452d86>] Not tainted VLI EFLAGS: 00010206 (2.6.18-164.el5xen #1) EIP is at mempool_alloc+0x22/0xc9 eax: 00011200 ebx: 00000010 ecx: df87fba0 edx: ddf79000 esi: 00000000 edi: 00000020 ebp: 00011210 esp: ddf79af8 ds: 007b es: 007b ss: 0069 Process poweroff (pid: 2191, ti=ddf79000 task=dfe5f550 task.ti=ddf79000) Stack: df87c740 c04754fa 00000010 10000000 df87c9c0 c07ec740 df87fba0 00000020 c07ec64c c045a093 ddf79b74 df87fba0 00000000 00000000 df87c740 00000000 00000000 00000000 00000000 df87fba0 00000020 df87c9c0 c04dbea2 00001000 Call Trace: [<c04754fa>] bio_alloc_bioset+0x9b/0xf3 [<c045a093>] blk_queue_bounce+0xd5/0x258 [<c04dbea2>] __make_request+0x44/0x348 [<c04da2bb>] generic_make_request+0x255/0x265 [<ee260442>] __map_bio+0x44/0x103 [dm_mod] [<ee260e2e>] __split_bio+0x17e/0x438 [dm_mod] [<c047635c>] bio_add_page+0x25/0x2d [<ee26179b>] dm_request+0xdb/0xe8 [dm_mod] [<c04da2bb>] generic_make_request+0x255/0x265 [<c046c958>] kmem_cache_alloc+0x54/0x5e [<c04e7ef0>] radix_tree_node_alloc+0x16/0x50 [<c04dc421>] submit_bio+0xd5/0xdd [<c04502d5>] add_to_page_cache+0x9e/0xa6 [<c0490340>] mpage_end_io_read+0x0/0x66 [<c048f64d>] mpage_bio_submit+0x19/0x1d [<c04907bd>] mpage_readpages+0xa0/0xaa [<c04548dc>] __alloc_pages+0x57/0x297 [<ee274a7b>] ext3_readpages+0x0/0x15 [ext3] [<c0455df0>] __do_page_cache_readahead+0x125/0x18b [<ee2755fa>] ext3_get_block+0x0/0xd6 [ext3] [<c0455e9c>] blockable_page_cache_readahead+0x46/0x99 [<c045602f>] page_cache_readahead+0xb3/0x178 [<c04507d8>] do_generic_mapping_read+0xb8/0x37b [<c0451304>] __generic_file_aio_read+0x16a/0x1a3 [<c044fdd1>] file_read_actor+0x0/0xd5 [<c0451378>] generic_file_aio_read+0x3b/0x42 [<c047029f>] do_sync_read+0xb6/0xf1 [<c042fef7>] autoremove_wake_function+0x0/0x2d [<c04701e9>] do_sync_read+0x0/0xf1 [<c0470b78>] vfs_read+0x9f/0x141 [<c04796db>] kernel_read+0x32/0x43 [<c04797b3>] prepare_binprm+0xc7/0xcc [<c047b251>] do_execve+0xc3/0x1b2 [<c040337d>] sys_execve+0x2a/0x4a [<c0405413>] syscall_call+0x7/0xb ======================= Code: 89 f8 ff 53 18 5b 5e 5f c3 55 57 56 89 c6 53 89 d3 83 ec 14 f6 c2 10 74 05 e8 1e 4b 1c 00 89 dd 81 cd 00 12 01 00 89 e8 83 e0 af <8b> 56 10 ff 56 14 85 c0 89 c3 0f 85 8d 00 00 00 89 f0 e8 db 61 EIP: [<c0452d86>] mempool_alloc+0x22/0xc9 SS:ESP 0069:ddf79af8 <0>Kernel panic - not syncing: Fatal exception
Additional notes: Using the steps above and the -164 kernel inside the guest, it seems to be reproducible. Using the -128 kernel inside the guest, it is *not* reproducible, so it's a regression between 5.3 and 5.4. Chris Lalancette
Bizarrely, after bisecting this, it's this commit that causes a problem: commit 911d74df73a60067a0d4f31f364e521077a8854c Author: Chris Lalancette <clalance> Date: Thu Mar 5 14:13:05 2009 +0100 [xen] xen reports bogus LowTotal Message-id: 49AFCFE1.9050501 O-Subject: [RHEL5.4 PATCH]: Xen reports bogus LowTotal Bugzilla: 428892 RH-Acked-by: Don Dutile <ddutile> RH-Acked-by: Rik van Riel <riel> All, The xen kernel can report a LowTotal of 4Tb on a system, even though th system only has 3.5Gb of memory. That's obviously totally bogus. The probl is that the balloon driver wasn't properly accounting for totalhigh_pages in it's calculations, which screws up the rest of the reporting in the system. This is a straightforward backport of linux-2.6.18-xen.hg c/s 79 and 128, an seems to fix the problem for the reporter. This will fix BZ 428892. Please review and ACK -- Chris Lalancette diff --git a/drivers/xen/balloon/balloon.c b/drivers/xen/balloon/balloon.c index 39d7185..e8ce44f 100644 --- a/drivers/xen/balloon/balloon.c +++ b/drivers/xen/balloon/balloon.c @@ -93,6 +93,15 @@ static unsigned long frame_list[PAGE_SIZE / sizeof(unsigned l /* VM /proc information for memory */ extern unsigned long totalram_pages; +#ifndef MODULE +extern unsigned long totalhigh_pages; +#define inc_totalhigh_pages() (totalhigh_pages++) +#define dec_totalhigh_pages() (totalhigh_pages--) +#else +#define inc_totalhigh_pages() ((void)0) +#define dec_totalhigh_pages() ((void)0) +#endif + /* We may hit the hard limit in Xen. If we do then we remember it. */ static unsigned long hard_limit; @@ -137,6 +146,7 @@ static void balloon_append(struct page *page) if (PageHighMem(page)) { list_add_tail(PAGE_TO_LIST(page), &ballooned_pages); balloon_high++; + dec_totalhigh_pages(); } else { list_add(PAGE_TO_LIST(page), &ballooned_pages); balloon_low++; @@ -154,8 +164,10 @@ static struct page *balloon_retrieve(void) page = LIST_TO_PAGE(ballooned_pages.next); UNLIST_PAGE(page); - if (PageHighMem(page)) + if (PageHighMem(page)) { balloon_high--; + inc_totalhigh_pages(); + } else balloon_low--; Reverting that commit, and only that commit, makes the problem go away. However, the crash really doesn't have anything directly to do with the totalhigh_pages. My analysis of the crash so far is: mm/mempool.c:mempool_alloc() crashes at line 220, accessing 00000010. That means at that line, pool is NULL, and it's trying to access NULL->pool_data. Going back further in the stack, mempool_alloc() is being called from mm/highmem.c:__blk_queue_bounce(), line 409. The NULL pool is just passed into there. mm/highmem.c:blk_queue_bounce() is the one that actually figures out the pool. However, this is quite strange; the pool is set to one of two static pools, either isa_page_pool or page_pool. There is a BUG(!isa_page_pool), so we are probably not going through that path. However, page_pool should *never* be NULL; it's was initialized early on during boot, and is never changed. So this leads us to one of two things; either that is being initialized, and is later being clobbered (memory corruption), or we should never come into that path with Xen (which I'm just not sure about). Chris Lalancette
Got it. We are missing upstream linux-2.6.18-xen.hg c/s 148: # HG changeset patch # User Ian Campbell <ian.campbell> # Date 1185543936 -3600 # Node ID 667228bf8fc5f1a21719e11c7eb269d0188a2d60 # Parent 88a17da7f3362126182423100a9d7d4c0d854139 BLKFRONT: Make sure we don't use bounce buffers, we don't need them. Signed-off-by: Ian Campbell <ian.campbell> diff -r 88a17da7f336 -r 667228bf8fc5 drivers/xen/blkfront/vbd.c --- a/drivers/xen/blkfront/vbd.c Thu Jul 26 16:36:52 2007 +0100 +++ b/drivers/xen/blkfront/vbd.c Fri Jul 27 14:45:36 2007 +0100 @@ -213,6 +213,9 @@ /* Make sure buffer addresses are sector-aligned. */ blk_queue_dma_alignment(rq, 511); + /* Make sure we don't use bounce buffers. */ + blk_queue_bounce_limit(rq, BLK_BOUNCE_ANY); + gd->queue = rq; return 0; With this in place, my reproducer in the summary works just fine. I'll get this ready for inclusion. Chris Lalancette
in kernel-2.6.18-177.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html