Bug 661989 - [ia64] xencomm_privcmd_domctl: unknown domctl cmd 45
Summary: [ia64] xencomm_privcmd_domctl: unknown domctl cmd 45
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen
Version: 5.6
Hardware: ia64
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Laszlo Ersek
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
: 696599 (view as bug list)
Depends On:
Blocks: 638638
TreeView+ depends on / blocked
 
Reported: 2010-12-10 09:18 UTC by Alexander Todorov
Modified: 2011-04-14 14:26 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 970066 (view as bug list)
Environment:
Last Closed: 2011-01-10 09:46:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
console log of RHEL5.5 crash (8.58 KB, text/plain)
2010-12-22 22:12 UTC, Laszlo Ersek
no flags Details
current CPU and memory topology of hp-rx8640-03 (65.74 KB, image/png)
2010-12-24 22:11 UTC, Laszlo Ersek
no flags Details
hp-rx2660-04 crash logs and hp-rx2660-01.brq / hp-rx2660-04 Xen boot logs (8.03 KB, application/x-bzip2)
2010-12-27 15:32 UTC, Laszlo Ersek
no flags Details

Description Alexander Todorov 2010-12-10 09:18:28 UTC
Description of problem:
Xen dom0 crashes and reboots upon starting HVM guest.

Version-Release number of selected component (if applicable):
xen-3.0.3-120.el5
kernel-xen-2.6.18-236.el5

How reproducible:
Always on particular hardware.

Steps to Reproduce:
1. Boot with kernel-xen on hp-rx8640-03.rhts.eng.bos.redhat.com. Append xenheap_megabytes=122 to the boot args because of bug #521865
2. Create new HVM guest with virt-manager
3. The guest properties are: HVM, boot from boot.iso, Type=Linux, RHEL5.4 or later
4. Both host and guest OS is RHEL 5.6 snap #5
  
Actual results:
Crash

Expected results:
Guest is started.

Additional info:xencomm_privcmd_domctl: unknown domctl cmd 45
(XEN) *** xen_handle_domain_access: exception table lookup failed, iip=0xf0000705fc042080, addr=0x7f, spinning...
(XEN) $$$$$ PANIC in domain 0 (k6=0xf000000008838000): *** xen_handle_domain_access: exception table lookup failed, iip=0xf0000705fc042080, addr=0x7f, spinning...
(XEN) d 0xf000000008854080 domid 0
(XEN) vcpu 0xf000000008838000 vcpu 0
(XEN) 
(XEN) CPU 0
(XEN) psr : 0000121008222038 ifs : 8000000000000408 ip  : [<f0000705fc042081>]
(XEN) ip is at ???
(XEN) unat: 0000000000000000 pfs : 000000000000028d rsc : 0000000000000003
(XEN) rnat: 0000000000000714 bsps: 0000000000000003 pr  : 00000000006a81d9
(XEN) ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
(XEN) csd : 0000000000000000 ssd : 0000000000000000
(XEN) b0  : f00000000409f160 b6  : f000000008701000 b7  : f0000705fc020190
(XEN) f6  : 0fffafffffffff0000000 f7  : 0ffdef000000000000000
(XEN) f8  : 10002f000000000000000 f9  : 100038000000000000000
(XEN) f10 : 0fffeeffffffff1000000 f11 : 1003e0000000000000000
(XEN) r1  : f0000000043a4b50 r2  : f00000000883fc40 r3  : 0000000000000590
(XEN) r8  : 0000000000000000 r9  : 0000000000000000 r10 : 0000000000000000
(XEN) r11 : 0000000000000000 r12 : f00000000883fba0 r13 : f000000008838000
(XEN) r14 : 000000000000007f r15 : f00000000b7e8150 r16 : 0000000000000000
(XEN) r17 : 0000000000010000 r18 : 000001ffffffffe0 r19 : 0000007ffffffff8
(XEN) r20 : 0000000000000001 r21 : 0000000000000280 r22 : 0000000000000001
(XEN) r23 : f00000000b7d0678 r24 : f0000000040ac110 r25 : f00000000b7d0000
(XEN) r26 : 0000000000000000 r27 : 0000000000000000 r28 : 0800000000000000
(XEN) r29 : f00000000883fb48 r30 : 00000121f8000000 r31 : f0000000041b4f00
(XEN) 
(XEN) Call Trace:
(XEN)  [<f0000000040c0530>] show_stack+0x80/0xa0
(XEN)                                 sp=f00000000883f650 bsp=f0000000088395b0
(XEN)  [<f000000004094040>] panic_domain+0x120/0x170
(XEN)                                 sp=f00000000883f820 bsp=f000000008839548
(XEN)  [<f000000004086540>] ia64_do_page_fault+0x690/0x6a0
(XEN)                                 sp=f00000000883f960 bsp=f0000000088394b8
(XEN)  [<f0000000040b9320>] ia64_leave_kernel+0x0/0x300
(XEN)                                 sp=f00000000883f9a0 bsp=f0000000088394b8
(XEN)  [<f0000705fc042080>] ???
(XEN)                                 sp=f00000000883fba0 bsp=f000000008839478
(XEN)  [<f00000000409f160>] update_vhpi+0xb0/0xd0
(XEN)                                 sp=f00000000883fba0 bsp=f000000008839450
(XEN)  [<f00000000409f670>] vlsapic_reset+0x160/0x1f0
(XEN)                                 sp=f00000000883fba0 bsp=f000000008839430
(XEN)  [<f0000000040a4090>] vmx_final_setup_guest+0x310/0x600
(XEN)                                 sp=f00000000883fba0 bsp=f0000000088393d0
(XEN)  [<f000000004061e90>] arch_set_info_guest+0x3b0/0x3e0
(XEN)                                 sp=f00000000883fc40 bsp=f000000008839388
(XEN)  [<f00000000401dec0>] do_domctl+0x1760/0x17d0
(XEN)                                 sp=f00000000883fc40 bsp=f000000008839308
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f00000000883fe00 bsp=f000000008839308
(XEN) domain_crash_sync called from xenmisc.c:171
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) d 0xf000000008854080 domid 0
(XEN) vcpu 0xf000000008838000 vcpu 0
(XEN) 
(XEN) CPU 0
(XEN) psr : 0000141208526030 ifs : 0000000000000006 ip  : [<a00000010006a3e2>]
(XEN) ip is at ???
(XEN) unat: 0000000000000000 pfs : 8000000000000591 rsc : 000000000000000b
(XEN) rnat: 0000000000000000 bsps: e0000700cc0291b0 pr  : 00000000006a6999
(XEN) ldrs: 0000000001380000 ccv : 0000000000000002 fpsr: 0009804c0270033f
(XEN) csd : 0000000000000000 ssd : 0000000000000000
(XEN) b0  : a00000010006feb0 b6  : a00000010006fd90 b7  : a0000001000180b0
(XEN) f6  : 1003e0000000000000002 f7  : 1003eaf8af8af8af8af8b
(XEN) f8  : 1003e0000000000000046 f9  : 1003e0000000000000002
(XEN) f10 : 1003e0000000000000046 f11 : 1003eaf8af8af8af8af8b
(XEN) r1  : a000000100c782a0 r2  : 000000001c032d24 r3  : 00000000e0196920
(XEN) r8  : 0000000000000001 r9  : bfffffffabf67894 r10 : 150261daabf67894
(XEN) r11 : 0000000000000000 r12 : e0000700cc02fb40 r13 : e0000700cc028000
(XEN) r14 : 00000000000007ff r15 : 0000000000000024 r16 : 00000000c4163bfc
(XEN) r17 : 00002a046fac7894 r18 : 150237d63c4a0000 r19 : 00002b5f6b049d80
(XEN) r20 : 0000015afb5824ec r21 : 000000568dd07a3c r22 : 000000055c9ba3e4
(XEN) r23 : 0000000620b1dfe0 r24 : 0000000620b1dfe0 r25 : a0007ff9df21c000
(XEN) r26 : a000000100a918e8 r27 : a000000100a918e8 r28 : 0000000000000000
(XEN) r29 : 0000000000024000 r30 : 0000000000000000 r31 : a0007fffffd39ff8
(XEN) 
(XEN) Call Trace:
(XEN)  [<f0000000040c0530>] show_stack+0x80/0xa0
(XEN)                                 sp=f00000000883f650 bsp=f000000008839608
(XEN)  [<f00000000401f300>] __domain_crash+0x100/0x140
(XEN)                                 sp=f00000000883f820 bsp=f0000000088395d8
(XEN)  [<f00000000401f380>] __domain_crash_synchronous+0x40/0xf0
(XEN)                                 sp=f00000000883f820 bsp=f0000000088395b0
(XEN)  [<f000000004094080>] panic_domain+0x160/0x170
(XEN)                                 sp=f00000000883f820 bsp=f000000008839548
(XEN)  [<f000000004086540>] ia64_do_page_fault+0x690/0x6a0
(XEN)                                 sp=f00000000883f960 bsp=f0000000088394b8
(XEN)  [<f0000000040b9320>] ia64_leave_kernel+0x0/0x300
(XEN)                                 sp=f00000000883f9a0 bsp=f0000000088394b8
(XEN)  [<f0000705fc042080>] ???
(XEN)                                 sp=f00000000883fba0 bsp=f000000008839478
(XEN)  [<f00000000409f160>] update_vhpi+0xb0/0xd0
(XEN)                                 sp=f00000000883fba0 bsp=f000000008839450
(XEN)  [<f00000000409f670>] vlsapic_reset+0x160/0x1f0
(XEN)                                 sp=f00000000883fba0 bsp=f000000008839430
(XEN)  [<f0000000040a4090>] vmx_final_setup_guest+0x310/0x600
(XEN)                                 sp=f00000000883fba0 bsp=f0000000088393d0
(XEN)  [<f000000004061e90>] arch_set_info_guest+0x3b0/0x3e0
(XEN)                                 sp=f00000000883fc40 bsp=f000000008839388
(XEN)  [<f00000000401dec0>] do_domctl+0x1760/0x17d0
(XEN)                                 sp=f00000000883fc40 bsp=f000000008839308
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f00000000883fe00 bsp=f000000008839308
(XEN) 
(XEN) Call Trace:
(XEN)  [<f0000000040c0530>] show_stack+0x80/0xa0
(XEN)                                 sp=f00000000883f650 bsp=f000000008839608
(XEN)  [<f00000000401f310>] __domain_crash+0x110/0x140
(XEN)                                 sp=f00000000883f820 bsp=f0000000088395d8
(XEN)  [<f00000000401f380>] __domain_crash_synchronous+0x40/0xf0
(XEN)                                 sp=f00000000883f820 bsp=f0000000088395b0
(XEN)  [<f000000004094080>] panic_domain+0x160/0x170
(XEN)                                 sp=f00000000883f820 bsp=f000000008839548
(XEN)  [<f000000004086540>] ia64_do_page_fault+0x690/0x6a0
(XEN)                                 sp=f00000000883f960 bsp=f0000000088394b8
(XEN)  [<f0000000040b9320>] ia64_leave_kernel+0x0/0x300
(XEN)                                 sp=f00000000883f9a0 bsp=f0000000088394b8
(XEN)  [<f0000705fc042080>] ???
(XEN)                                 sp=f00000000883fba0 bsp=f000000008839478
(XEN)  [<f00000000409f160>] update_vhpi+0xb0/0xd0
(XEN)                                 sp=f00000000883fba0 bsp=f000000008839450
(XEN)  [<f00000000409f670>] vlsapic_reset+0x160/0x1f0
(XEN)                                 sp=f00000000883fba0 bsp=f000000008839430
(XEN)  [<f0000000040a4090>] vmx_final_setup_guest+0x310/0x600
(XEN)                                 sp=f00000000883fba0 bsp=f0000000088393d0
(XEN)  [<f000000004061e90>] arch_set_info_guest+0x3b0/0x3e0
(XEN)                                 sp=f00000000883fc40 bsp=f000000008839388
(XEN)  [<f00000000401dec0>] do_domctl+0x1760/0x17d0
(XEN)                                 sp=f00000000883fc40 bsp=f000000008839308
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f00000000883fe00 bsp=f000000008839308
(XEN) Domain 0 crashed: rebooting machine in 5 seconds.


# cat /boot/efi/efi/redhat/elilo.conf 
prompt
timeout=20
default=xen
relocatable

image=vmlinuz-2.6.18-236.el5xen
	vmm=xen.gz-2.6.18-236.el5
	label=xen
	initrd=initrd-2.6.18-236.el5xen.img
	read-only
	root=/dev/VolGroup00/LogVol00
	append="xenheap_megabytes=122 -- rhgb quiet"
image=vmlinuz-2.6.18-236.el5
	label=linux
	initrd=initrd-2.6.18-236.el5.img
	read-only
	root=/dev/VolGroup00/LogVol00
	append="rhgb quiet"

Comment 2 Laszlo Ersek 2010-12-13 13:19:57 UTC
Both "xen/include/public/domctl.h" and "kernel/include/xen/interface/domctl.h" have 

/* Assign PCI device to HVM guest. Sets up IOMMU structures. */
#define XEN_DOMCTL_assign_device      37
#define XEN_DOMCTL_test_assign_device 45

But neither not handled by xencomm_privcmd_domctl() in "kernel/arch/ia64/xen/xcom_privcmd.c".

Comment 3 Andrew Jones 2010-12-13 14:29:32 UTC
Please attach the guest config file.

Comment 4 Laszlo Ersek 2010-12-13 14:31:25 UTC
Alexander, can you please add the host dmesg and the xend log? Thanks.

Comment 5 Laszlo Ersek 2010-12-13 15:36:25 UTC
I believe the failed XEN_DOMCTL_test_assign_device is not the direct cause of the crash. The xend log file could confirm this. I think xend continued after it received (and logged) the error for XEN_DOMCTL_test_assign_device.

The lower frames of the crash should look like this (with some gaps, probably), to be read bottom-up:

* [... hypervisor stack trace here from the Description ...]
* xc_hvm_build() in "tools/libxc/ia64/xc_ia64_hvm_build.c" (xen userspace) issues
  XEN_DOMCTL_setvcpucontext.
* pyxc_hvm_build() in "tools/python/xen/lowlevel/xc/xc.c" (xen userspace)
  calls xc_hvm_build().
* buildDomain() in "tools/python/xen/xend/image.py" (xen userspace) calls
  "xc.hvm_build".

At the top, update_vhpi() calls ia64_call_vsa() in "arch/ia64/vmx/vmx_vsa.S". The faulting instruction pointer (f0000705fc042080) seems to fall in a very different address range than the other entries on the hypervisor's stack. So I suppose either the call to ia64_call_vsa() fails, or that latter function (which I don't understand) does something nasty here:

    br.cond.sptk b6         // call the service

I guess this might be even a hypervisor crash.

Alexander, Andrew tells me host dump captures aren't supported on ia64 and thus we'll have to debug by serial. Can you please describe the hardware more closely so that we can allocate a similar test machine and try to reproduce the crash?

Or can we experiment with hp-rx8640-03.rhts.eng.bos.redhat.com itself, interactively? (With serial console connected to a nearby machine?)

Comment 7 Alexander Todorov 2010-12-14 08:18:32 UTC
(In reply to comment #3)
> Please attach the guest config file.

I will have to reserve the system again to grab this and other files. It wasn't anything special. I created a full virt guest with virt-manager, chose to boot from boot.iso, assigned a single file image for the guest disk and selected bridged networking.


(In reply to comment #5)

> Or can we experiment with hp-rx8640-03.rhts.eng.bos.redhat.com itself,
> interactively? (With serial console connected to a nearby machine?)

This system is available in Beaker and has serial console configured. You can reserve it and experiment with it.

Comment 9 Laszlo Ersek 2010-12-14 14:55:13 UTC
The message

*** xen_handle_domain_access: exception table lookup failed,
iip=0xf0000705fc042080, addr=0x7f, spinning...

is from ia64_do_page_fault(). Citation from "arch/ia64/xen/faults.c"
(hypervisor) reindented for bugzilla:

251 if (!user_mode(regs)) {
252   /* The fault occurs inside Xen.  */
253   if (!ia64_done_with_exception(regs)) {
254	// should never happen.  If it does, region 0 addr may
255	// indicate a bad xen pointer
256	printk("*** xen_handle_domain_access: exception table"
257	       " lookup failed, iip=0x%lx, addr=0x%lx, "
258	       "spinning...\n", iip, address);
259	panic_domain(regs, "*** xen_handle_domain_access: "
260		     "exception table lookup failed, "
261		     "iip=0x%lx, addr=0x%lx, spinning...\n",
262		     iip, address);

Comment 12 Laszlo Ersek 2010-12-16 19:45:05 UTC
I could reproduce the bug. Host:

# uname -r
2.6.18-236.el5

# yum install xen
(10/12): xen-3.0.3-120.el5.ia64.rpm
(12/12): kernel-xen-2.6.18-236.el5.ia64.rpm

... append="xenheap_megabytes=122 --"

# uname -r
2.6.18-236.el5xen

# yum install virt-manager

# rpm -ivh http://download.devel.redhat.com/brewroot/packages/xen-ia64-guest-firmware/1.0.13/1/ia64/xen-ia64-guest-firmware-1.0.13-1.ia64.rpm

Then created the following HVM guest:

- virt method: fully virtualized
- init mem: 4G
- max mem: 4G
- numvcpu: 8
- OS: RHEL 5.4 or later
- Installation source:
  http://download.devel.redhat.com/nightly/RHEL5.6-Server-20101215.n/5/ia64/os/
  (No ia64 was available for RHEL5.6-Server-20101208.n on the download
  server)
- disk image: file based
- disk size: 4G
- network: shared physdev
- network target: xenbr0
- sound: off

Virt-manager created some storage file, then booted the guest, then I got a hypervisor dump that matched the one in comment #0, except for the following difference:

--- 0	2010-12-16 20:44:26.535262936 +0100
+++ 1	2010-12-16 20:43:20.371263092 +0100
@@ -21,7 +21,7 @@
 (XEN) r14 : 000000000000007f r15 : f00000000b7e8150 r16 : 0000000000000000
 (XEN) r17 : 0000000000010000 r18 : 000001ffffffffe0 r19 : 0000007ffffffff8
 (XEN) r20 : 0000000000000001 r21 : 0000000000000280 r22 : 0000000000000001
-(XEN) r23 : f00000000b7d0678 r24 : f0000000040ac110 r25 : f00000000b7d0000
+(XEN) r23 : f00000000b7a0678 r24 : f0000000040ac110 r25 : f00000000b7a0000
 (XEN) r26 : 0000000000000000 r27 : 0000000000000000 r28 : 0800000000000000
 (XEN) r29 : f00000000883fb48 r30 : 00000121f8000000 r31 : f0000000041b4f00
 (XEN) 
@@ -57,22 +57,22 @@
 (XEN) psr : 0000141208526030 ifs : 0000000000000006 ip  : [<a00000010006a3e2>]
 (XEN) ip is at ???
 (XEN) unat: 0000000000000000 pfs : 8000000000000591 rsc : 000000000000000b
-(XEN) rnat: 0000000000000000 bsps: e0000700cc0291b0 pr  : 00000000006a6999
-(XEN) ldrs: 0000000001380000 ccv : 0000000000000002 fpsr: 0009804c0270033f
+(XEN) rnat: 0000000000000000 bsps: e0000700cd0691b0 pr  : 00000000006a6999
+(XEN) ldrs: 0000000001380000 ccv : 0000000000000000 fpsr: 0009804c0270033f
 (XEN) csd : 0000000000000000 ssd : 0000000000000000
 (XEN) b0  : a00000010006feb0 b6  : a00000010006fd90 b7  : a0000001000180b0
 (XEN) f6  : 1003e0000000000000002 f7  : 1003eaf8af8af8af8af8b
 (XEN) f8  : 1003e0000000000000046 f9  : 1003e0000000000000002
 (XEN) f10 : 1003e0000000000000046 f11 : 1003eaf8af8af8af8af8b
-(XEN) r1  : a000000100c782a0 r2  : 000000001c032d24 r3  : 00000000e0196920
-(XEN) r8  : 0000000000000001 r9  : bfffffffabf67894 r10 : 150261daabf67894
-(XEN) r11 : 0000000000000000 r12 : e0000700cc02fb40 r13 : e0000700cc028000
-(XEN) r14 : 00000000000007ff r15 : 0000000000000024 r16 : 00000000c4163bfc
-(XEN) r17 : 00002a046fac7894 r18 : 150237d63c4a0000 r19 : 00002b5f6b049d80
-(XEN) r20 : 0000015afb5824ec r21 : 000000568dd07a3c r22 : 000000055c9ba3e4
-(XEN) r23 : 0000000620b1dfe0 r24 : 0000000620b1dfe0 r25 : a0007ff9df21c000
-(XEN) r26 : a000000100a918e8 r27 : a000000100a918e8 r28 : 0000000000000000
-(XEN) r29 : 0000000000024000 r30 : 0000000000000000 r31 : a0007fffffd39ff8
+(XEN) r1  : a000000100c782a0 r2  : ffffffffffffc000 r3  : 00000000c416ff77
+(XEN) r8  : 0000000000000001 r9  : a0007ff9df21c000 r10 : a000000100a918e8
+(XEN) r11 : 0000000000000000 r12 : e0000700cd06fb40 r13 : e0000700cd068000
+(XEN) r14 : a0007fffffd9bbb8 r15 : 0000000000000024 r16 : a0007fffffd98000
+(XEN) r17 : a0007fffffd9bbef r18 : 0000000000000000 r19 : 8000000000000001
+(XEN) r20 : e0000700f2a80030 r21 : 0000000000000006 r22 : e0000700f2a80000
+(XEN) r23 : 0000000000000000 r24 : a0007fffffcd4738 r25 : 0000000000000400
+(XEN) r26 : a000000100a91ce8 r27 : a000000100a91ce8 r28 : 000000001c03f800
+(XEN) r29 : a000000100a91cf0 r30 : a000000100a91cf0 r31 : a0007fffffcd4748
 (XEN) 
 (XEN) Call Trace:
 (XEN)  [<f0000000040c0530>] show_stack+0x80/0xa0

Comment 13 Alexander Todorov 2010-12-20 16:51:17 UTC
On hp-rx2660-04 with kernel-2.6.18-225.el5xen I get a slightly different result: 

The first time the system paniced and rebooted (I wasn't connected to the serial console but ssh connection dropped and system rebooted). After the reboot there are no guests running and when trying to create the new guest I get:

# xencomm_privcmd_domctl: unknown domctl cmd 45
(XEN) No enough contiguous memory(16384KB) for init_domain_vhpt


# free -m
             total       used       free     shared    buffers     cached
Mem:          3446       1082       2363          0        107        454
-/+ buffers/cache:        521       2924
Swap:         5951          0       5951

Comment 14 Alexander Todorov 2010-12-20 17:10:08 UTC
Hi Laszlo,
on a second retry on hp-rx2660-04 this time with kernel-2.6.18-237.el5xen and xen-3.0.3-120.el5 I got:

xencomm_privcmd_domctl: unknown domctl cmd 45
(XEN) ia64_fault, vector=0x4, ifa=0xf300001e21e1df78, iip=0xf00000000407da30, ipsr=0x0000121008226038, isr=0x00000a0400000000
(XEN) Alt DTLB.
(XEN) d 0xf000000007c88080 domid 0
(XEN) vcpu 0xf000000007c68000 vcpu 0
(XEN) 
(XEN) CPU 0
(XEN) psr : 0000121008226038 ifs : 800000000000040d ip  : [<f00000000407da31>]
(XEN) ip is at domain_page_flush_and_put+0x441/0x500
(XEN) unat: 0000000000000000 pfs : 000000000000040d rsc : 0000000000000003
(XEN) rnat: 0000000000000000 bsps: f0000000043a4b50 pr  : 000000000069c199
(XEN) ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f
(XEN) csd : 0000000000000000 ssd : 0000000000000000
(XEN) b0  : f00000000407da30 b6  : f0000000040b98d0 b7  : f000000004002e50
(XEN) f6  : 0ffff8000000000000000 f7  : 000000000000000000000
(XEN) f8  : 000000000000000000000 f9  : 000000000000000000000
(XEN) f10 : 000000000000000000000 f11 : 000000000000000000000
(XEN) r1  : f0000000043a4b50 r2  : 0000000000000000 r3  : 000000000000003f
(XEN) r8  : 0000000000000000 r9  : 0000000000000000 r10 : a000000100a91920
(XEN) r11 : 0000000000000008 r12 : f000000007c6fdd0 r13 : f000000007c68000
(XEN) r14 : f000000007c60018 r15 : f300001e21e1df78 r16 : 0000000000000000
(XEN) r17 : f000000004c10480 r18 : 0000000000000000 r19 : 0000000000000001
(XEN) r20 : 0000000000000001 r21 : 0000000000000000 r22 : 0000001008226038
(XEN) r23 : 0000000000000000 r24 : 0000000000000020 r25 : 0000000000000000
(XEN) r26 : e0000100b5393dc0 r27 : 0000000000000000 r28 : a000000201154010
(XEN) r29 : 0000000000000000 r30 : 0000000000000000 r31 : f0000000041bc580
(XEN) 
(XEN) Call Trace:
(XEN)  [<f0000000040c0530>] show_stack+0x80/0xa0
(XEN)                                 sp=f000000007c6fa00 bsp=f000000007c695e8
(XEN)  [<f000000004088430>] ia64_fault+0x9e0/0xbb0
(XEN)                                 sp=f000000007c6fbd0 bsp=f000000007c695a8
(XEN)  [<f0000000040b9320>] ia64_leave_kernel+0x0/0x300
(XEN)                                 sp=f000000007c6fbd0 bsp=f000000007c695a8
(XEN)  [<f00000000407da30>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c6fdd0 bsp=f000000007c69540
(XEN)  [<f000000004081970>] __dom0vp_add_physmap+0x330/0x630
(XEN)                                 sp=f000000007c6fde0 bsp=f000000007c694d8
(XEN)  [<f00000000405fa70>] do_dom0vp_op+0x1f0/0x560
(XEN)                                 sp=f000000007c6fdf0 bsp=f000000007c69498
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007c6fe00 bsp=f000000007c69498
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Fault in Xen.
(XEN) ****************************************
(XEN) 
(XEN) Reboot in five seconds...


Please advise if this is the same one or different?  I will open another bug if different.

Comment 15 Paolo Bonzini 2010-12-20 21:02:43 UTC
I'll bisect tomorrow.

Comment 16 Laszlo Ersek 2010-12-21 13:36:45 UTC
I'm still trying to check if this is a regression. I have now provisioned another rx2600 series Integrity machine with RHEL5.5 and am about to create a virtual machine as described in comment 12. Since (from my last attempt which I was unable to finish before the provisioning time expired) RHEL5.5 doesn't seem to support creation of HVM guests, I'll have to do it by writing/customizing a config file for "xm create", so it will be a bit slower.

Re "unknown domctl cmd 45", now I'm convinced it has nothing to do with the problem (or problems, plural, if comment 14 actually pertains to some other problem -- I can't say if it does). Both domctls mentioned in comment 2 were added to upstream Xen as no-ops:

XEN_DOMCTL_assign_device:

    http://xenbits.xensource.com/linux-2.6.18-xen.hg/rev/641

XEN_DOMCTL_test_assign_device:

    http://xenbits.xensource.com/linux-2.6.18-xen.hg/rev/741

It's also clear that the hypervisor is the one to crash. I'll try to comb through the xen-ia64-devel archives.

Comment 17 Laszlo Ersek 2010-12-21 14:06:14 UTC
(In reply to comment #16)
> RHEL5.5 doesn't seem to support creation of HVM guests

I meant "virt-manager in RHEL5.5", sorry.

Comment 18 Laszlo Ersek 2010-12-21 14:59:11 UTC
https://beaker.engineering.redhat.com/jobs/41014

Host:

# uname -psrn
Linux hp-rx2660-03.rhts.eng.bos.redhat.com 2.6.18-194.el5xen ia64

installed:
  xen 3.0.3-105.el5
  kernel-xen 2.6.18-194.el5
  xen-ia64-guest-firmware.ia64 0:1.0.13-1

I was able to create and start, with virt-manager, an HVM guest like described in comment 12, except the installation source was

  http://download.devel.redhat.com/released/RHEL-5-Server/U5/ia64/os/

However this is a different machine (the repro in comment 12 was made on hp-rx8640-03.rhts.eng.bos.redhat.com, see job 39757), so now I have to try to crash the HV with RHEL5.6.

Comment 19 Laszlo Ersek 2010-12-21 18:07:11 UTC
Job: https://beaker.engineering.redhat.com/jobs/41117
System: hp-rx2660-03.rhts.eng.bos.redhat.com
Distro: RHEL5.6-Server-20101221.n_nfs-ia64

Right after boot:

[root@hp-rx2660-03 ~]# uname -psrn
Linux hp-rx2660-03.rhts.eng.bos.redhat.com 2.6.18-238.el5 ia64

Installed the following with yum:

  xen-3.0.3-120.el5
  kernel-xen-2.6.18-238.el5

Installed the following with rpm:

  xen-ia64-guest-firmware-1.0.13-1

Rebooted host.

[root@hp-rx2660-03 ~]# uname -psrn
Linux hp-rx2660-03.rhts.eng.bos.redhat.com 2.6.18-238.el5xen ia64

Succeeded to create and start (with virt-manager) a hvm guest. The difference to comment 18 was the installation path (http://download.devel.redhat.com/nightly/RHEL5.6-Server-20101221.n/5/ia64/os/). At the EFI prompt I typed "fs0:" then "elilo" (thanks Paolo). The guest kernel booted and the installer started.

Thus the test on hp-rx2660-03 is inconclusive. Everything worked fine both with RHEL-5.5 and RHEL5.6-Server-20101221.n. I also saw none of the problems described in comment 13 and comment 14. (The machine affected by those, hp-rx2660-04, should be a "sibling" of hp-rx2660-03 -- that's why I reserved hp-rx2660-03 when I saw that hp-rx2660-04 was occupied.)

I'm waiting for the following queued jobs to start:

- 41012 -- hp-rx8640-02.rhts.eng.bos.redhat.com; need to check both with RHEL5.6 and RHEL5.5.
- 41013 -- hp-rx8640-03.rhts.eng.bos.redhat.com; need to check only with RHEL5.5 (RHEL5.6 already done in comment 12).

Current status:
- still unable to say if this is a regression
- no idea what causes either type of crash (and whether they have a common root cause)

Comment 20 Alexander Todorov 2010-12-22 07:58:43 UTC
(In reply to comment #17)
> (In reply to comment #16)
> > RHEL5.5 doesn't seem to support creation of HVM guests
> 
> I meant "virt-manager in RHEL5.5", sorry.

It does support HVM guests. On ia64 however you need to install xen-ia64-guest-firmware package from the Supplementary CD.

Comment 21 Laszlo Ersek 2010-12-22 22:12:52 UTC
Created attachment 470318 [details]
console log of RHEL5.5 crash

(This is the RHEL5.5 half of the test initiated in comment 12, on the same
machine.)

https://beaker.engineering.redhat.com/jobs/41013

Host:

[root@hp-rx8640-03 ~]# uname -psrn
Linux hp-rx8640-03.rhts.eng.bos.redhat.com 2.6.18-194.el5 ia64

# yum install xen
(10/12): xen-3.0.3-105.el5.ia64.rpm
(12/12): kernel-xen-2.6.18-194.el5.ia64.rpm

# yum install xen-ia64-guest-firmware
  xen-ia64-guest-firmware.ia64 0:1.0.13-1

... append="xenheap_megabytes=122 --"

After reboot:

[root@hp-rx8640-03 ~]# uname -psrn
Linux hp-rx8640-03.rhts.eng.bos.redhat.com 2.6.18-194.el5xen ia64

Created a similar hvm guest as described in comment 12, used 
http://download.devel.redhat.com/released/RHEL-5-Server/U5/ia64/os/ as
boot image path. When the guest started, the hypervisor crashed. The console output was similar to that in comment #0, see the attachment.

Therefore this bug is not a regression.

Notably, if one diffs this attachment with the trace in comment #0, the invalid faulting address is the same:

 (XEN)  [<f0000705fc042080>] ???
 (XEN)                                 sp=f00000000883fba0 bsp=f000000008839478

Interestingly, the outermost frame is not fast_hypercall() this time; below that, there is

+(XEN)  [<f00000000409f080>] update_vhpi+0xb0/0xd0
+(XEN)                                 sp=f00000000883fe00 bsp=f000000008839308

Comment 22 Laszlo Ersek 2010-12-22 22:20:46 UTC
Status:
- Not a regression (comment 21).
- "XEN_DOMCTL_test_assign_device" is irrelevant wrt. the bug (comment 16).
- Hypervisor crash (comment 9).
- Cause of bug and contingent identity to comment 13 / comment 14 still unclear.

I strongly suspect (a) the bug is hardware depentent, and (b) I don't know enough about ia64 to go deeper than this.

Comment 24 Laszlo Ersek 2010-12-23 13:36:53 UTC
(In reply to comment #14)
> Hi Laszlo,
> on a second retry on hp-rx2660-04 this time with kernel-2.6.18-237.el5xen and
> xen-3.0.3-120.el5 I got:
> 
> xencomm_privcmd_domctl: unknown domctl cmd 45
> (XEN) ia64_fault, vector=0x4, ifa=0xf300001e21e1df78, iip=0xf00000000407da30,
> ipsr=0x0000121008226038, isr=0x00000a0400000000
> (XEN) Alt DTLB.
> (XEN) d 0xf000000007c88080 domid 0
> (XEN) vcpu 0xf000000007c68000 vcpu 0

Confirming this crash too.

                         HOSTNAME=hp-rx2660-04.rhts.eng.bos.redhat.com
                            JOBID=41493
                         RECIPEID=84487
                    RESULT_SERVER=127.0.0.1:7080
                           DISTRO=RHEL5.6-Server-20101221.n
                     ARCHITECTURE=ia64

[root@hp-rx2660-04 ~]# uname -psrn
Linux hp-rx2660-04.rhts.eng.bos.redhat.com 2.6.18-238.el5 ia64

xen-3.0.3-120.el5.ia64.rpm
kernel-xen-2.6.18-238.el5.ia64.rpm
xen-ia64-guest-firmware-1.0.13-1.ia64.rpm

[root@hp-rx2660-04 ~]# uname -psrn
Linux hp-rx2660-04.rhts.eng.bos.redhat.com 2.6.18-238.el5xen ia64

- virt method: fully virtualized
- init mem: 2G
- max mem: 2G
- numvcpu: 2
- OS: RHEL 5.4 or later
- Installation source:
  http://download.devel.redhat.com/nightly/RHEL5.6-Server-20101222.0/5/ia64/os/
- disk image: file based
- disk size: 4G
- network: shared physdev
- network target: xenbr0
- sound: off

Trace diff:

< xencomm_privcmd_domctl: unknown domctl cmd 45
< (XEN) ia64_fault, vector=0x4, ifa=0xf300001e21e1df78, iip=0xf00000000407da30, ipsr=0x0000121008226038, isr=0x00000a0400000000
---
> Red Hat Enterprise Linux Server release 5.6 (Tikanga)
> Kernel 2.6.18-238.el5xen on an ia64
> 
> hp-rx2660-04.rhts.eng.bos.redhat.com login: xencomm_privcmd_domctl: unknown domctl cmd 45
> (XEN) ia64_fault, vector=0x4, ifa=0xf300001e21e1df78, iip=0xf00000000407da30, ipsr=0x0000121008226018, isr=0x00000a0400000000
8c11
< (XEN) psr : 0000121008226038 ifs : 800000000000040d ip  : [<f00000000407da31>]
---
> (XEN) psr : 0000121008226018 ifs : 800000000000040d ip  : [<f00000000407da31>]
11c14
< (XEN) rnat: 0000000000000000 bsps: f0000000043a4b50 pr  : 000000000069c199
---
> (XEN) rnat: 0000000000000011 bsps: f0000000041af5c8 pr  : 000000000069c199
23c26
< (XEN) r20 : 0000000000000001 r21 : 0000000000000000 r22 : 0000001008226038
---
> (XEN) r20 : 0000000000000001 r21 : 0000000000000000 r22 : 0000001008226018
25c28
< (XEN) r26 : e0000100b5393dc0 r27 : 0000000000000000 r28 : a000000201154010
---
> (XEN) r26 : e0000100c6593540 r27 : 0000000000000000 r28 : a000000201150010

Will check with RHEL5.5 too.

Comment 25 Laszlo Ersek 2010-12-23 14:28:03 UTC
(In reply to comment #24)

> Confirming this crash too.

> [root@hp-rx2660-04 ~]# uname -psrn
> Linux hp-rx2660-04.rhts.eng.bos.redhat.com 2.6.18-238.el5xen ia64
> 
> - virt method: fully virtualized
> - init mem: 2G
> - max mem: 2G
> - numvcpu: 2
> - OS: RHEL 5.4 or later
> - Installation source:
>   http://download.devel.redhat.com/nightly/RHEL5.6-Server-20101222.0/5/ia64/os/
> - disk image: file based
> - disk size: 4G
> - network: shared physdev
> - network target: xenbr0
> - sound: off

> Will check with RHEL5.5 too.

                         HOSTNAME=hp-rx2660-04.rhts.eng.bos.redhat.com
                            JOBID=41787
                         RECIPEID=85222
                    RESULT_SERVER=127.0.0.1:7094
                           DISTRO=RHEL5-Server-U5
                     ARCHITECTURE=ia64

[root@hp-rx2660-04 ~]# uname -psrn
Linux hp-rx2660-04.rhts.eng.bos.redhat.com 2.6.18-194.el5 ia64

(15/18): xen-ia64-guest-firmware-1.0.13-1.ia64.rpm
(16/18): xen-3.0.3-105.el5.ia64.rpm
(18/18): kernel-xen-2.6.18-194.el5.ia64.rpm

[root@hp-rx2660-04 ~]# uname -psrn
Linux hp-rx2660-04.rhts.eng.bos.redhat.com 2.6.18-194.el5xen ia64

http://download.devel.redhat.com/released/RHEL-5-Server/U5/ia64/os/

Hypervisor crashed again, this time with a very deep mutual recursion.
Crash is possibly due to stack overflow.

The crash on this machine is not a regression either.

Red Hat Enterprise Linux Server release 5.5 (Tikanga)
Kernel 2.6.18-194.el5xen on an ia64

hp-rx2660-04.rhts.eng.bos.redhat.com login: xencomm_privcmd_domctl: unknown domctl cmd 45
(XEN) ia64_fault, vector=0x4, ifa=0xf300001e21e1df78, iip=0xf00000000407da10, ipsr=0x0000121008226038, isr=0x00000a0400000000
(XEN) Alt DTLB.
(XEN) d 0xf000000007c88080 domid 0
(XEN) vcpu 0xf000000007c68000 vcpu 0
(XEN) 
(XEN) CPU 0
(XEN) psr : 0000121008226038 ifs : 800000000000040d ip  : [<f00000000407da11>]
(XEN) ip is at domain_page_flush_and_put+0x441/0x500
(XEN) unat: 0000000000000000 pfs : 000000000000040d rsc : 0000000000000003
(XEN) rnat: 0000000000000000 bsps: f0000000043a4b50 pr  : 000000000069c199
(XEN) ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f
(XEN) csd : 0000000000000000 ssd : 0000000000000000
(XEN) b0  : f00000000407da10 b6  : f0000000040b97f0 b7  : f000000004002e50
(XEN) f6  : 0ffff8000000000000000 f7  : 000000000000000000000
(XEN) f8  : 000000000000000000000 f9  : 000000000000000000000
(XEN) f10 : 000000000000000000000 f11 : 000000000000000000000
(XEN) r1  : f0000000043a4b50 r2  : 0000000000000000 r3  : 000000000000003f
(XEN) r8  : 0000000000000000 r9  : 0000000000000000 r10 : a000000100a82830
(XEN) r11 : 0000000000000008 r12 : f000000007c6fdd0 r13 : f000000007c68000
(XEN) r14 : f000000007c60018 r15 : f300001e21e1df78 r16 : 0000000000000000
(XEN) r17 : f000000004c10480 r18 : 0000000000000000 r19 : 0000000000000001
(XEN) r20 : 0000000000000001 r21 : 0000000000000000 r22 : 0000001008226038
(XEN) r23 : 0000000000000000 r24 : 0000000000000020 r25 : 0000000000000000
(XEN) r26 : e0000100c4da6dc0 r27 : 0000000000000000 r28 : a00000020105c010
(XEN) r29 : 0000000000000000 r30 : 0000000000000000 r31 : f0000000041bc580
(XEN) 
(XEN) Call Trace:
(XEN)  [<f0000000040c0450>] show_stack+0x80/0xa0
(XEN)                                 sp=f000000007c6fa00 bsp=f000000007c695e8
(XEN)  [<f000000004088340>] ia64_fault+0x9e0/0xbb0
(XEN)                                 sp=f000000007c6fbd0 bsp=f000000007c695a8
(XEN)  [<f0000000040b9240>] ia64_leave_kernel+0x0/0x300
(XEN)                                 sp=f000000007c6fbd0 bsp=f000000007c695a8
(XEN)  [<f00000000407da10>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c6fdd0 bsp=f000000007c69540
(XEN)  [<f000000004081950>] __dom0vp_add_physmap+0x330/0x630
(XEN)                                 sp=f000000007c6fde0 bsp=f000000007c694d8
(XEN)  [<f00000000405fa50>] do_dom0vp_op+0x1f0/0x560
(XEN)                                 sp=f000000007c6fdf0 bsp=f000000007c69498
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007c6fe00 bsp=f000000007c69498
(XEN)  [<f00000000407da10>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c6fe00 bsp=f000000007c69498
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007c6fe10 bsp=f000000007c69498
(XEN)  [<f00000000407da10>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c6fe10 bsp=f000000007c69498
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007c6fe20 bsp=f000000007c69498
(XEN)  [<f00000000407da10>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c6fe20 bsp=f000000007c69498
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007c6fe30 bsp=f000000007c69498
(XEN)  [<f00000000407da10>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c6fe30 bsp=f000000007c69498
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007c6fe40 bsp=f000000007c69498
(XEN)  [<f00000000407da10>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c6fe40 bsp=f000000007c69498
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007c6fe50 bsp=f000000007c69498
(XEN)  [<f00000000407da10>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c6fe50 bsp=f000000007c69498
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007c6fe60 bsp=f000000007c69498
(XEN)  [<f00000000407da10>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c6fe60 bsp=f000000007c69498
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007c6fe70 bsp=f000000007c69498
(XEN)  [<f00000000407da10>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c6fe70 bsp=f000000007c69498
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007c6fe80 bsp=f000000007c69498
(XEN)  [<f00000000407da10>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c6fe80 bsp=f000000007c69498
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007c6fe90 bsp=f000000007c69498
(XEN)  [<f00000000407da10>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c6fe90 bsp=f000000007c69498
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007c6fea0 bsp=f000000007c69498
(XEN)  [<f00000000407da10>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c6fea0 bsp=f000000007c69498
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007c6feb0 bsp=f000000007c69498
(XEN)  [<f00000000407da10>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c6feb0 bsp=f000000007c69498
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007c6fec0 bsp=f000000007c69498
(XEN)  [<f00000000407da10>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c6fec0 bsp=f000000007c69498
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007c6fed0 bsp=f000000007c69498
(XEN)  [<f00000000407da10>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c6fed0 bsp=f000000007c69498
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007c6fee0 bsp=f000000007c69498
(XEN)  [<f00000000407da10>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c6fee0 bsp=f000000007c69498
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007c6fef0 bsp=f000000007c69498
(XEN)  [<f00000000407da10>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c6fef0 bsp=f000000007c69498
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007c6ff00 bsp=f000000007c69498
(XEN)  [<f00000000407da10>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c6ff00 bsp=f000000007c69498
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007c6ff10 bsp=f000000007c69498
(XEN)  [<f00000000407da10>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c6ff10 bsp=f000000007c69498
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007c6ff20 bsp=f000000007c69498
(XEN)  [<f00000000407da10>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c6ff20 bsp=f000000007c69498
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007c6ff30 bsp=f000000007c69498
(XEN)  [<f00000000407da10>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c6ff30 bsp=f000000007c69498
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007c6ff40 bsp=f000000007c69498
(XEN)  [<f00000000407da10>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c6ff40 bsp=f000000007c69498
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007c6ff50 bsp=f000000007c69498
(XEN)  [<f00000000407da10>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c6ff50 bsp=f000000007c69498
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007c6ff60 bsp=f000000007c69498
(XEN)  [<f00000000407da10>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c6ff60 bsp=f000000007c69498
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007c6ff70 bsp=f000000007c69498
(XEN)  [<f00000000407da10>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c6ff70 bsp=f000000007c69498
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007c6ff80 bsp=f000000007c69498
(XEN)  [<f00000000407da10>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c6ff80 bsp=f000000007c69498
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007c6ff90 bsp=f000000007c69498
(XEN)  [<f00000000407da10>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c6ff90 bsp=f000000007c69498
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007c6ffa0 bsp=f000000007c69498
(XEN)  [<f00000000407da10>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c6ffa0 bsp=f000000007c69498
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007c6ffb0 bsp=f000000007c69498
(XEN)  [<f00000000407da10>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c6ffb0 bsp=f000000007c69498
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007c6ffc0 bsp=f000000007c69498
(XEN)  [<f00000000407da10>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c6ffc0 bsp=f000000007c69498
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007c6ffd0 bsp=f000000007c69498
(XEN)  [<f00000000407da10>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c6ffd0 bsp=f000000007c69498
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007c6ffe0 bsp=f000000007c69498
(XEN)  [<f00000000407da10>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c6ffe0 bsp=f000000007c69498
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007c6fff0 bsp=f000000007c69498
(XEN)  [<f00000000407da10>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c6fff0 bsp=f000000007c69498
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007c70000 bsp=f000000007c69498
(XEN)  [<f00000000407da10>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007c70000 bsp=f000000007c69498
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Fault in Xen.
(XEN) ****************************************
(XEN) 
(XEN) Reboot in five seconds...

Comment 27 Laszlo Ersek 2010-12-27 15:32:23 UTC
Created attachment 470857 [details]
hp-rx2660-04 crash logs and hp-rx2660-01.brq / hp-rx2660-04 Xen boot logs

Based on the advice I received, this is/was the plan:

1. Find the first RHEL5 xen hv on hp-rx2660-04 that broke the hvm guest start. Originally that machine must have been well supported.

2. If "1" finds a working RHEL xen on hp-rx2660-04, try it out on hp-rx8640-03.

3. Try to find a xen-3 upstream release working on hp-rx8640-03. hg repo to use: http://xenbits.xensource.com/xen-unstable.hg

4. This upstream repo may have a greater chance to work for step "3":
http://xenbits.xensource.com/ext/ia64/xen-unstable.hg

5. Builds of upstream xen ("3"/"4") might work better with open source EFI:
http://xenbits.xensource.com/ext/efi-vfirmware.hg?file/3ad73b4314e3/binaries/

6. Compare "xm dmesg" (hypervisor boot messages) right after boot on
hp-rx2660-04 vs. hp-rx2660-01.brq (should be similar) and check if the latter crashes as well.

----o----

Execution status:

**** 1. Find the first RHEL5 xen hv on hp-rx2660-04 that broke the hvm guest start. Originally that machine must have been well supported.

I grabbed the RHEL5 xen tags, the RHEL5 kernel tags, took their intersection, then looked up the ia64 kernel-xen RPMs for that (not all versions could be found in brew). I manually, partially "bisected" the resultant list. I checked 17 versions, of which 16 crashed when starting the hvm guest, and 1 simply didn't work. Here's the command I used for hvm guest creation:

    virt-install \
    --connect=xen \
    --name=rhel56-hvm \
    --ram=2048 \
    --arch=ia64 \
    --vcpus=2 \
    --os-type=linux \
    --os-variant=rhel5.4 \
    --hvm \
    --location=http://download.devel.redhat.com/nightly/RHEL5.6-Server-20101222.n/5/ia64/os/ \
    --disk=path=/var/lib/xen/images/rhel56-hvm,size=4,sparse=true \
    --network=bridge:xenbr0 \
    --debug \
    --prompt

This is the list of tested versions and the tag dates:

    2.6.18-62.el5   Dec  21  2007  15:44:52  -0500
    2.6.18-91.el5   Apr  23  2008  15:11:43  -0400
    2.6.18-124.el5  Nov  18  2008  16:24:38  -0500
    2.6.18-146.el5  May  12  2009  13:55:39  -0400
    2.6.18-162.el5  Aug  5   2009  10:33:16  -0400
    2.6.18-178.el5  Dec  9   2009  13:13:24  -0500
    2.6.18-194.el5  Mar  17  2010  11:53:52  -0400
    2.6.18-201.el5  May  28  2010  11:22:16  -0400
    2.6.18-208.el5  Jul  22  2010  18:10:08  -0400
    2.6.18-212.el5  Aug  11  2010  23:00:55  -0400
    2.6.18-219.el5  Sep  9   2010  16:51:06  -0400
    2.6.18-221.el5  Sep  13  2010  21:38:00  -0400
    2.6.18-222.el5  Sep  15  2010  21:51:35  -0400
    2.6.18-225.el5  Sep  27  2010  10:08:45  -0400
    2.6.18-230.el5  Oct  28  2010  16:37:07  -0400
    2.6.18-235.el5  Dec  1   2010  11:54:29  -0500
    2.6.18-238.el5  Dec  19  2010  13:48:52  -0500

The crash logs are in the "hp-rx2660-04.results" directory. -221 didn't crash but refused guest creation with "(XEN) No enough contiguous memory(16384KB) for init_domain_vhpt".


**** 2. If "1" finds a working RHEL xen on hp-rx2660-04, try it out on hp-rx8640-03.

N/A


**** 3. Try to find a xen-3 upstream release working on hp-rx8640-03. hg repo to use: http://xenbits.xensource.com/xen-unstable.hg

The NUMA configuration of hp-rx8640-03 is not optimal for Xen. Until that is fixed (see comment 26), this step is suspended.

To work a bit in advance, I built 19 tagged dist bundles from the abovementioned hg repo. Once the NUMA config is fixed, those may be used for manual bisection.

    dist-3.0.3-branched.tar.bz2  dist-3.3.0-rc3.tar.bz2
    dist-3.0.4-branched.tar.bz2  dist-3.3.0-rc4.tar.bz2
    dist-3.1.0-branched.tar.bz2  dist-3.3.0-rc5.tar.bz2
    dist-3.2.0-rc1.tar.bz2       dist-3.3.0-rc6.tar.bz2
    dist-3.2.0-rc2.tar.bz2       dist-3.3.0-rc7.tar.bz2
    dist-3.2.0-rc3.tar.bz2       dist-3.3.0-branched.tar.bz2
    dist-3.2.0-rc4.tar.bz2       dist-3.4.0-rc1.tar.bz2
    dist-3.2.0-rc5.tar.bz2       dist-3.4.0-rc2.tar.bz2
    dist-3.2.0-rc6.tar.bz2       dist-3.4.0-rc3.tar.bz2
    dist-3.3.0-rc1.tar.bz2

This script was used for building:

    #!/bin/bash
    set -e -u -x -C

    cd ~/src-hg/xen-unstable.hg
    while read TAG REV; do
      make -k distclean
      hg checkout $TAG
      make -j 40 world 2>&1 | tee -i ~/xenbuildlog/$TAG.log
      tar -c -v -f ~/xenbin/dist-$TAG.tar dist
    done <~/tested-upstream-xen-revisions 2>&1 \
    | tee -i ~/buildall2.log


**** 4. This upstream repo may have a greater chance to work for step "3":
http://xenbits.xensource.com/ext/ia64/xen-unstable.hg

See the status for the previous bullet:

    dist-ext-ia64-3.0.3-branched.tar.bz2  dist-ext-ia64-3.3.0-rc3.tar.bz2
    dist-ext-ia64-3.0.4-branched.tar.bz2  dist-ext-ia64-3.3.0-rc4.tar.bz2
    dist-ext-ia64-3.1.0-branched.tar.bz2  dist-ext-ia64-3.3.0-rc5.tar.bz2
    dist-ext-ia64-3.2.0-rc1.tar.bz2       dist-ext-ia64-3.3.0-rc6.tar.bz2
    dist-ext-ia64-3.2.0-rc2.tar.bz2       dist-ext-ia64-3.3.0-rc7.tar.bz2
    dist-ext-ia64-3.2.0-rc3.tar.bz2       dist-ext-ia64-3.3.0-branched.tar.bz2
    dist-ext-ia64-3.2.0-rc4.tar.bz2       dist-ext-ia64-3.4.0-rc1.tar.bz2
    dist-ext-ia64-3.2.0-rc5.tar.bz2       dist-ext-ia64-3.4.0-rc2.tar.bz2
    dist-ext-ia64-3.2.0-rc6.tar.bz2       dist-ext-ia64-3.4.0-rc3.tar.bz2
    dist-ext-ia64-3.3.0-rc1.tar.bz2


**** 5. Builds of upstream xen ("3"/"4") might work better with open source EFI:
http://xenbits.xensource.com/ext/efi-vfirmware.hg?file/3ad73b4314e3/binaries/

Suspended, see above.


**** 6. Compare "xm dmesg" (hypervisor boot messages) right after boot on
hp-rx2660-04 vs. hp-rx2660-01.brq (should be similar) and check if the latter crashes as well.

hp-rx2660-01.rhts.eng.brq.redhat.com didn't crash but successfully created the guest with the virt-install command described in bullet 1. (This is the second hp-rx2660 that has no problem creating the guest, see comment 18 and comment 19). Both machines were provisioned with RHEL5.6-Server-20101222.n_nfs-ia64 and booted with 2.6.18-238.el5xen. The "xm dmesg" outputs are in the attached tarball. To me the only non-trivial difference seems to be

+(XEN) Reducing dom0 memory allocation from 4194304K to 3928560K to fit available memory

in the hp-rx2660-04 xen boot log.


**** Summary

hp-rx2660-04 appears to be a dead-end. hp-rx8640-03 needs a NUMA reconfig
before testing can continue.

Comment 28 Laszlo Ersek 2011-01-10 09:46:05 UTC
Matt Brodeur provided expert help.

On 01/07/11 20:39, Matthew Brodeur wrote:

> The 8640s are configured to have two different configurations as
> they'd be used in the field.  The rx8640-03 uses interleaved memory,
> where -02 has a flat configuration.

> https://engineering.redhat.com/rt3/Ticket/Display.html?id=28539#txn-679741

>> Matt,
>> 
>> I configured this system to use NUMA while -02 is set up not to use NUMA.
>> With the NUMA config you cannot run xen. The MCA you saw is expected in
>> this case. Sorry, I really should have mentioned that bit!.
>> 
>> - Doug

Alex, I'm closing this as NOTABUG, as it was reported for hp-rx8640-03. The other "problematic machine" mentioned in this bug was hp-rx2660-04. Alas, I couldn't determine the cause of the crash. The difference between hp-rx2660-04 and hp-rx2660-01/hp-rx2660-03 might be NUMA again, or it might be something else. If you want that machine to be investigated further, please clone this bug (if appropriate), or please submit an independent bug. Thank you.

Comment 29 Laszlo Ersek 2011-04-14 14:26:00 UTC
*** Bug 696599 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.