Bug 620733 - guest kernel panic when changing memory size via xm mem-set for RHEL5.5-32 HVM guest with balloon driver on RHEL5.5-64 host
Summary: guest kernel panic when changing memory size via xm mem-set for RHEL5.5-32 HV...
Keywords:
Status: CLOSED DUPLICATE of bug 605697
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen
Version: 5.5
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Xen Maintainance List
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks: 514491
TreeView+ depends on / blocked
 
Reported: 2010-08-03 11:15 UTC by Lei Wang
Modified: 2010-08-18 13:44 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-08-18 13:44:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
rhel5.5-32 xen guest config file (549 bytes, text/plain)
2010-08-03 11:15 UTC, Lei Wang
no flags Details
xend.log (726.83 KB, text/plain)
2010-08-03 11:16 UTC, Lei Wang
no flags Details
detailed info analyzed by crash (1.35 KB, text/plain)
2010-08-03 11:17 UTC, Lei Wang
no flags Details
screenshot while kernel panic (86.26 KB, image/png)
2010-08-03 11:18 UTC, Lei Wang
no flags Details
xm dmesg info in the host (16.00 KB, text/plain)
2010-08-04 02:38 UTC, Lei Wang
no flags Details

Description Lei Wang 2010-08-03 11:15:15 UTC
Created attachment 436245 [details]
rhel5.5-32 xen guest config file

Description of problem:
When trying to change memory(via xm mem-set) of RHEL5.5-32 HVM guest(with balloon driver loaded) running on RHEL5.5-64 host, the guest will kernel panic, while running on RHEL5.5-32 host it won't cause kernel panic, and RHEL5.5-64 guest on RHEL5.5-64 host also works normally.

Version-Release number of selected component (if applicable):
Host:
RHEL5.5-64
xen-3.0.3-114.el5
kernel-xen-2.6.18-210.el5
Guest:
RHEL5.5-32(2.6.18-194.el5)

How reproducible:
Always

Steps to Reproduce:
1.check balloon driver was loaded with in RHEL5.5-32 HVM guest:

# lsmod |grep xen
xen_vnif               27073  0 [permanent]
xen_vbd                19765  0
xen_balloon            16021  1 xen_vnif,[permanent]
xen_platform_pci       64657  3 xen_vnif,xen_vbd,xen_balloon,[permanent]

2.balloon down the guest's memory to 400M:
# xm list
Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0     1713     2 r-----    259.3
rhel-5.5-32                               13      519     1 r-----      3.7
# xm mem-set rhel-5.5-32 400

3.check memory info in xenstore:
# xm list -l  rhel-5.5-32 |grep memory
    (memory 400)
    (shadow_memory 5)

Actual results:
After step2, RHEL5.5-32 HVM guest kernel panic and then reboot.

Expected results:
Guest should still on and the memory balloon down to 400M.

Additional info:
1.RHEL5.5-32 HVM guest on RHEL5.5-32 host works properly with xm mem-set.
2.RHEL5.5-64 HVM guest on RHEL5.5-64 host also works properly with xm mem-set.

Comment 1 Lei Wang 2010-08-03 11:16:06 UTC
Created attachment 436246 [details]
xend.log

Comment 2 Lei Wang 2010-08-03 11:17:18 UTC
Created attachment 436248 [details]
detailed info analyzed by crash

Comment 3 Lei Wang 2010-08-03 11:18:24 UTC
Created attachment 436249 [details]
screenshot while kernel panic

Comment 4 Andrew Jones 2010-08-03 13:18:41 UTC
The code before the BUG:

static int decrease_reservation(unsigned long nr_pages)
...
        set_xen_guest_handle(reservation.extent_start, frame_list);
        reservation.nr_extents   = nr_pages;
        ret = HYPERVISOR_memory_op(XENMEM_decrease_reservation, &reservation);
        BUG_ON(ret != nr_pages);
...

and its disassembly:

0xf896d345 <balloon_process+629>:       movl   $0xf896fe20,0xc(%esp)

0xf896d34d <balloon_process+637>:       lea    0xc(%esp),%ecx

0xf896d351 <balloon_process+641>:       mov    $0x1,%ebx

0xf896d356 <balloon_process+646>:       mov    %edi,0x10(%esp)

0xf896d35a <balloon_process+650>:       mov    0xf8934ba4,%eax

0xf896d35f <balloon_process+655>:       add    $0x180,%eax

0xf896d364 <balloon_process+660>:       call   *%eax

0xf896d366 <balloon_process+662>:       cmp    %edi,%eax

0xf896d368 <balloon_process+664>:       je     0xf896d372 <balloon_process+674>

0xf896d36a <balloon_process+666>:       ud2a   


We expected edi == eax (ret == nr_pages), but edi = 0x400 (4M) and eax = 0xffffffda

On an error XENMEM_decrease_reservation should return the current start extent (the current "start" extent after returning from a preemption) it was on, which is an extent between 0 and nr_extents. It will also return the current start extent if start_extent > nr_extents, but the question is how would that happen? Or in this case specifically how did the current start extent get to be 0xffffffda? I don't think it could, but 0xffffffda looks like an errno to me anyway; 0xffffffda == -38 == -ENOSYS.

Are there any interesting logs in 'xm dmesg'? Does this hvm guest have the pv-on-hvm drivers loaded?

Andrew

Comment 5 Lei Wang 2010-08-04 02:38:54 UTC
Created attachment 436418 [details]
xm dmesg info in the host

In RHEL-5.5 guest, all the 4 xenpv drivers are loaded as default, including xen_balloon, as described in Bug Description.

Comment 6 Andrew Jones 2010-08-04 08:34:59 UTC
The interesting log from 'xm dmesg' is

(XEN) hvm.c:783:d3 memory_op 1.

This confirms that we don't currently support hvm ballooning when using 32-on-64. The HV is using do_memory_op_compat32() for the hypercall on this config, and it doesn't implement memory_op == 1 == XENMEM_decrease_reservation, thus it returns ENOSYS. So this is apparently the first time this has ever been tested. Looking at upstream Xen, it doesn't look like it would take much to support it, but we can lower the priority since no customers have yet requested this feature.

Comment 7 Andrew Jones 2010-08-04 08:40:22 UTC
This bug was opened the wrong component anyway (it should be kernel-xen for all rhel5.5 xen kernel/driver issues), and now we know for sure that the fix would be in the hypervisor, so moving it there.

Comment 8 Andrew Jones 2010-08-18 13:44:48 UTC

*** This bug has been marked as a duplicate of bug 605697 ***


Note You need to log in before you can comment on or make changes to this bug.