Created attachment 436245 [details]
rhel5.5-32 xen guest config file
Description of problem:
When trying to change memory(via xm mem-set) of RHEL5.5-32 HVM guest(with balloon driver loaded) running on RHEL5.5-64 host, the guest will kernel panic, while running on RHEL5.5-32 host it won't cause kernel panic, and RHEL5.5-64 guest on RHEL5.5-64 host also works normally.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.check balloon driver was loaded with in RHEL5.5-32 HVM guest:
# lsmod |grep xen
xen_vnif 27073 0 [permanent]
xen_vbd 19765 0
xen_balloon 16021 1 xen_vnif,[permanent]
xen_platform_pci 64657 3 xen_vnif,xen_vbd,xen_balloon,[permanent]
2.balloon down the guest's memory to 400M:
# xm list
Name ID Mem(MiB) VCPUs State Time(s)
Domain-0 0 1713 2 r----- 259.3
rhel-5.5-32 13 519 1 r----- 3.7
# xm mem-set rhel-5.5-32 400
3.check memory info in xenstore:
# xm list -l rhel-5.5-32 |grep memory
After step2, RHEL5.5-32 HVM guest kernel panic and then reboot.
Guest should still on and the memory balloon down to 400M.
1.RHEL5.5-32 HVM guest on RHEL5.5-32 host works properly with xm mem-set.
2.RHEL5.5-64 HVM guest on RHEL5.5-64 host also works properly with xm mem-set.
Created attachment 436246 [details]
Created attachment 436248 [details]
detailed info analyzed by crash
Created attachment 436249 [details]
screenshot while kernel panic
The code before the BUG:
static int decrease_reservation(unsigned long nr_pages)
reservation.nr_extents = nr_pages;
ret = HYPERVISOR_memory_op(XENMEM_decrease_reservation, &reservation);
BUG_ON(ret != nr_pages);
and its disassembly:
0xf896d345 <balloon_process+629>: movl $0xf896fe20,0xc(%esp)
0xf896d34d <balloon_process+637>: lea 0xc(%esp),%ecx
0xf896d351 <balloon_process+641>: mov $0x1,%ebx
0xf896d356 <balloon_process+646>: mov %edi,0x10(%esp)
0xf896d35a <balloon_process+650>: mov 0xf8934ba4,%eax
0xf896d35f <balloon_process+655>: add $0x180,%eax
0xf896d364 <balloon_process+660>: call *%eax
0xf896d366 <balloon_process+662>: cmp %edi,%eax
0xf896d368 <balloon_process+664>: je 0xf896d372 <balloon_process+674>
0xf896d36a <balloon_process+666>: ud2a
We expected edi == eax (ret == nr_pages), but edi = 0x400 (4M) and eax = 0xffffffda
On an error XENMEM_decrease_reservation should return the current start extent (the current "start" extent after returning from a preemption) it was on, which is an extent between 0 and nr_extents. It will also return the current start extent if start_extent > nr_extents, but the question is how would that happen? Or in this case specifically how did the current start extent get to be 0xffffffda? I don't think it could, but 0xffffffda looks like an errno to me anyway; 0xffffffda == -38 == -ENOSYS.
Are there any interesting logs in 'xm dmesg'? Does this hvm guest have the pv-on-hvm drivers loaded?
Created attachment 436418 [details]
xm dmesg info in the host
In RHEL-5.5 guest, all the 4 xenpv drivers are loaded as default, including xen_balloon, as described in Bug Description.
The interesting log from 'xm dmesg' is
(XEN) hvm.c:783:d3 memory_op 1.
This confirms that we don't currently support hvm ballooning when using 32-on-64. The HV is using do_memory_op_compat32() for the hypercall on this config, and it doesn't implement memory_op == 1 == XENMEM_decrease_reservation, thus it returns ENOSYS. So this is apparently the first time this has ever been tested. Looking at upstream Xen, it doesn't look like it would take much to support it, but we can lower the priority since no customers have yet requested this feature.
This bug was opened the wrong component anyway (it should be kernel-xen for all rhel5.5 xen kernel/driver issues), and now we know for sure that the fix would be in the hypervisor, so moving it there.
*** This bug has been marked as a duplicate of bug 605697 ***