Bug 431001 - [RHEL5 U2] Kernel-xen Panic on CPU 1: ia64_fault, When trying to install a xen guest
Summary: [RHEL5 U2] Kernel-xen Panic on CPU 1: ia64_fault, When trying to install a xe...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen
Version: 5.2
Hardware: ia64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Jarod Wilson
QA Contact: Martin Jenner
URL: http://rhts.lab.boston.redhat.com/tes...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-01-31 00:51 UTC by Jeff Burke
Modified: 2008-05-21 15:08 UTC (History)
8 users (show)

Fixed In Version: RHBA-2008-0314
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-05-21 15:08:35 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2008:0314 0 normal SHIPPED_LIVE Updated kernel packages for Red Hat Enterprise Linux 5.2 2008-05-20 18:43:34 UTC

Description Jeff Burke 2008-01-31 00:51:12 UTC
Description of problem:
 While trying to test   xen kernel on ia64 with 
hp-lp1.rhts.boston.redhat.com

Version-Release number of selected component (if applicable):
 2.6.18-76.el5

How reproducible:
 Unknown

Steps to Reproduce:
1. Using host hp-lp1.rhts.boston.redhat.com. Install RHEL5.U1
2. Install 2.6.18-76.el5 kernel-xen, reboot
3. Try installing a xen guest

Actual results:

(XEN) ia64_fault, vector=0x4, ifa=0xf300000a00003210, iip=0xf000000004078850,
ipsr=0x0000121008226018, isr=0x00000a0400000000
(XEN) Alt DTLB.
(XEN) d 0xf000000007bb0080 domid 0
(XEN) vcpu 0xf000000007b88000 vcpu 1
(XEN) 
(XEN) CPU 1
(XEN) psr : 0000121008226018 ifs : 800000000000040d ip  : [<f000000004078851>]
(XEN) ip is at domain_page_flush_and_put+0x441/0x500
(XEN) unat: 0000000000000000 pfs : 000000000000040d rsc : 0000000000000003
(XEN) rnat: 0000000000000000 bsps: f000000004394a20 pr  : 0000000000698999
(XEN) ldrs: 0000000000000000 ccv : 000000000000fcda fpsr: 0009804c0270033f
(XEN) csd : 0000000000000000 ssd : 0000000000000000
(XEN) b0  : f000000004078850 b6  : f0000000040b3330 b7  : f000000004002e50
(XEN) f6  : 0ffff8000000000000000 f7  : 000000000000000000000
(XEN) f8  : 000000000000000000000 f9  : 000000000000000000000
(XEN) f10 : 000000000000000000000 f11 : 000000000000000000000
(XEN) r1  : f000000004394a20 r2  : 0000000000000000 r3  : 000000000000003f
(XEN) r8  : 0000000000000000 r9  : 0000000000000000 r10 : a000000100a2a138
(XEN) r11 : 0000000000000008 r12 : f000000007b8fdd0 r13 : f000000007b88000
(XEN) r14 : f000000007b88018 r15 : f300000a00003210 r16 : 0000000000000001
(XEN) r17 : 0000000000000001 r18 : 0000000000000000 r19 : 0000000000000001
(XEN) r20 : 0000000000000001 r21 : 0000000000000000 r22 : 0000001008226018
(XEN) r23 : 0000000000000000 r24 : 0000000000000001 r25 : f0000000041a2610
(XEN) r26 : 0000000000000001 r27 : 0000000000000000 r28 : a000000200938010
(XEN) r29 : 0000000000000000 r30 : 0000000000000000 r31 : f0000000041ab280
(XEN) 
(XEN) Call Trace:
(XEN)  [<f0000000040b9f90>] show_stack+0x80/0xa0
(XEN)                                 sp=f000000007b8fa00 bsp=f000000007b89588
(XEN)  [<f000000004080580>] ia64_fault+0xa30/0xad0
(XEN)                                 sp=f000000007b8fbd0 bsp=f000000007b89550
(XEN)  [<f0000000040b2d80>] ia64_leave_kernel+0x0/0x300
(XEN)                                 sp=f000000007b8fbd0 bsp=f000000007b89550
(XEN)  [<f000000004078850>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007b8fdd0 bsp=f000000007b894e8
(XEN)  [<f00000000407c620>] __dom0vp_add_physmap+0x330/0x630
(XEN)                                 sp=f000000007b8fde0 bsp=f000000007b89480
(XEN)  [<f00000000405c910>] do_dom0vp_op+0x1f0/0x560
(XEN)                                 sp=f000000007b8fdf0 bsp=f000000007b89440
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007b8fe00 bsp=f000000007b89440
(XEN)  [<f000000004078850>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007b8fe00 bsp=f000000007b89440
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007b8fe10 bsp=f000000007b89440
(XEN)  [<f000000004078850>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007b8fe10 bsp=f000000007b89440
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007b8fe20 bsp=f000000007b89440
(XEN)  [<f000000004078850>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007b8fe20 bsp=f000000007b89440
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007b8fe30 bsp=f000000007b89440
(XEN)  [<f000000004078850>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007b8fe30 bsp=f000000007b89440
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007b8fe40 bsp=f000000007b89440
(XEN)  [<f000000004078850>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007b8fe40 bsp=f000000007b89440
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007b8fe50 bsp=f000000007b89440
(XEN)  [<f000000004078850>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007b8fe50 bsp=f000000007b89440
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007b8fe60 bsp=f000000007b89440
(XEN)  [<f000000004078850>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007b8fe60 bsp=f000000007b89440
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007b8fe70 bsp=f000000007b89440
(XEN)  [<f000000004078850>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007b8fe70 bsp=f000000007b89440
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007b8fe80 bsp=f000000007b89440
(XEN)  [<f000000004078850>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007b8fe80 bsp=f000000007b89440
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007b8fe90 bsp=f000000007b89440
(XEN)  [<f000000004078850>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007b8fe90 bsp=f000000007b89440
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007b8fea0 bsp=f000000007b89440
(XEN)  [<f000000004078850>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007b8fea0 bsp=f000000007b89440
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007b8feb0 bsp=f000000007b89440
(XEN)  [<f000000004078850>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007b8feb0 bsp=f000000007b89440
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007b8fec0 bsp=f000000007b89440
(XEN)  [<f000000004078850>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007b8fec0 bsp=f000000007b89440
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007b8fed0 bsp=f000000007b89440
(XEN)  [<f000000004078850>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007b8fed0 bsp=f000000007b89440
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007b8fee0 bsp=f000000007b89440
(XEN)  [<f000000004078850>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007b8fee0 bsp=f000000007b89440
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007b8fef0 bsp=f000000007b89440
(XEN)  [<f000000004078850>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007b8fef0 bsp=f000000007b89440
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007b8ff00 bsp=f000000007b89440
(XEN)  [<f000000004078850>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007b8ff00 bsp=f000000007b89440
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007b8ff10 bsp=f000000007b89440
(XEN)  [<f000000004078850>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007b8ff10 bsp=f000000007b89440
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007b8ff20 bsp=f000000007b89440
(XEN)  [<f000000004078850>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007b8ff20 bsp=f000000007b89440
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007b8ff30 bsp=f000000007b89440
(XEN)  [<f000000004078850>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007b8ff30 bsp=f000000007b89440
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007b8ff40 bsp=f000000007b89440
(XEN)  [<f000000004078850>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007b8ff40 bsp=f000000007b89440
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007b8ff50 bsp=f000000007b89440
(XEN)  [<f000000004078850>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007b8ff50 bsp=f000000007b89440
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007b8ff60 bsp=f000000007b89440
(XEN)  [<f000000004078850>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007b8ff60 bsp=f000000007b89440
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007b8ff70 bsp=f000000007b89440
(XEN)  [<f000000004078850>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007b8ff70 bsp=f000000007b89440
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007b8ff80 bsp=f000000007b89440
(XEN)  [<f000000004078850>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007b8ff80 bsp=f000000007b89440
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007b8ff90 bsp=f000000007b89440
(XEN)  [<f000000004078850>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007b8ff90 bsp=f000000007b89440
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007b8ffa0 bsp=f000000007b89440
(XEN)  [<f000000004078850>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007b8ffa0 bsp=f000000007b89440
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007b8ffb0 bsp=f000000007b89440
(XEN)  [<f000000004078850>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007b8ffb0 bsp=f000000007b89440
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007b8ffc0 bsp=f000000007b89440
(XEN)  [<f000000004078850>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007b8ffc0 bsp=f000000007b89440
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007b8ffd0 bsp=f000000007b89440
(XEN)  [<f000000004078850>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007b8ffd0 bsp=f000000007b89440
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007b8ffe0 bsp=f000000007b89440
(XEN)  [<f000000004078850>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007b8ffe0 bsp=f000000007b89440
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007b8fff0 bsp=f000000007b89440
(XEN)  [<f000000004078850>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007b8fff0 bsp=f000000007b89440
(XEN)  [<f000000004002e80>] fast_hypercall+0x170/0x2f0
(XEN)                                 sp=f000000007b90000 bsp=f000000007b89440
(XEN)  [<f000000004078850>] domain_page_flush_and_put+0x440/0x500
(XEN)                                 sp=f000000007b90000 bsp=f000000007b89440
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 1:
(XEN) Fault in Xen.
(XEN) ****************************************
(XEN) 
(XEN) Reboot in five seconds...

Additional info:
 Several people have tried to reproduce on other systems. So this maybe host
specific. This system did boot and installed guests with the -75.el5 xen and the
-74.el5 xen kernels. It also works in RHEL5.U1.

Comment 1 Chris Lalancette 2008-01-31 02:06:14 UTC
Jeff,
     Ah, thanks for getting the logs.  Given that this is a hypervisor failure,
it is even less likely that it is the /sbin/init patch that I put into -76. 
What is also odd, however, is that the other Xen (> 3 vNics) patch that went
into -76 was a kernel patch, not an HV patch.  So one of those patches must be
tickling the hypervisor in a way it wasn't doing before.

Anyway, given the easy test case, it should be easy enough to back those patches
out one at a time and find out which one it actually is (although it seems it
needs to be done on that particular hardware).

Chris Lalancette

Comment 2 Doug Chapman 2008-01-31 22:56:48 UTC
Jarod hit this same issue a while back on this same system.  We closed that BZ
as non reproducable.  Adding him to the cc list.


Comment 3 Jeff Burke 2008-01-31 23:25:46 UTC
I just reserved the machine and re-ran the test. I was able to reproduce it the
first time I tried:

(XEN) ia64_fault, vector=0x4, ifa=0xf300000a00003210, iip=0xf000000004078850,
ipsr=0x0000121008226018, isr=0x00000a0400000000
(XEN) Alt DTLB.
(XEN) d 0xf000000007bb0080 domid 0
(XEN) vcpu 0xf000000007b88000 vcpu 1
(XEN) 
(XEN) CPU 1
(XEN) psr : 0000121008226018 ifs : 800000000000040d ip  : [<f000000004078851>]
(XEN) ip is at domain_page_flush_and_put+0x441/0x500
(XEN) unat: 0000000000000000 pfs : 000000000000040d rsc : 0000000000000003
(XEN) rnat: 0000000000000000 bsps: f000000004394a20 pr  : 0000000000698999
(XEN) ldrs: 0000000000000000 ccv : 000000000000f748 fpsr: 0009804c0270033f
(XEN) csd : 0000000000000000 ssd : 0000000000000000
(XEN) b0  : f000000004078850 b6  : f0000000040b3330 b7  : f000000004002e50
(XEN) f6  : 0ffff8000000000000000 f7  : 000000000000000000000
(XEN) f8  : 000000000000000000000 f9  : 000000000000000000000
(XEN) f10 : 000000000000000000000 f11 : 000000000000000000000
(XEN) r1  : f000000004394a20 r2  : 0000000000000000 r3  : 000000000000003f
(XEN) r8  : 0000000000000000 r9  : 0000000000000000 r10 : a000000100a2a138
(XEN) r11 : 0000000000000008 r12 : f000000007b8fdd0 r13 : f000000007b88000
(XEN) r14 : f000000007b88018 r15 : f300000a00003210 r16 : 0000000000000001
(XEN) r17 : 0000000000000001 r18 : 0000000000000000 r19 : 0000000000000001
(XEN) r20 : 0000000000000001 r21 : 0000000000000000 r22 : 0000001008226018
(XEN) r23 : 0000000000000000 r24 : 0000000000000001 r25 : f0000000041a2610
(XEN) r26 : 0000000000000001 r27 : 0000000000000000 r28 : a000000200938010
(XEN) r29 : 0000000000000000 r30 : 0000000000000000 r31 : f0000000041ab280

So this really looks like a machine specific thing. I have the system reserved
for the next 48 hours. Let me know if anyone need access to it.

Comment 7 Jeff Burke 2008-02-05 19:25:13 UTC
Kei or Tetsu,
   Any update on this issue.

Thanks,
Jeff

Comment 8 Tetsu Yamamoto 2008-02-05 20:37:21 UTC
I've not found the cause yet, but it seems that the kernel before -75 also had 
this problem.

* kernel -76 without the following patches reproduce the problem:
 - [Xen] gnttab: allow more than 3 VNIFs (Tetsu Yamamoto ) [297331]
 - [xen] fix /sbin/init to use cpu_possible (Chris Lalancette ) [430310]
* kernel -75 also reproduce the problem (but not always).


Comment 9 Tetsu Yamamoto 2008-02-11 15:31:45 UTC
If dom0_mem is specified at hypervisor parameter, such as dom0_mem=1G, this 
problem does not occur.

Comment 10 Jarod Wilson 2008-02-11 15:48:32 UTC
So most likely, the hypervisor isn't being reserved quite enough memory to do
its thing if no param is specified. Without any dom0_mem param, we'll try to
grab up to 4G of memory for dom0, and this box only has 2G. In theory, we're
still reserving enough for the hypervisor (we take less than the full 2G here),
but its sort of a guesstimate.

Comment 11 Doug Chapman 2008-02-13 16:28:18 UTC
HP's recommendation for minimum memory w/xen on ia64 is 4GB (not sure what the
official Red Hat recommendation is).  Since this bug appears to be completely
related to the small amount of memory on the system my recommendation would be
to not test xen on this box.  Also, this system is no longer shipped by HP so we
never officially certified it for xen.




Comment 15 Bill Burns 2008-02-14 15:48:57 UTC
Comment #9 provides a workaround that may need to become a release note if the
underlying cause if not fixed.


Comment 20 Don Domingo 2008-02-15 00:10:01 UTC
i didn't get what triggers this error, but adding the following note to RHEL5.2
release notes ("Known Issues") for your review:

<quote>
(ia64) If your system has less than 2GB of memory, a kernel panic may occur (on
the host's kernel) if you attempt to create a guest. When this occurs, use the
kernel parameter dom0_mem=1G on the hypervisor kernel before retrying.
</quote>

please advise if any further revisions are required. thanks!

Comment 21 Bill Burns 2008-02-15 01:51:35 UTC
Probably should read "if your system has 2GB or less".

Comment 22 Don Domingo 2008-02-15 02:11:00 UTC
thanks Bill, revised as:

<quote>
If your system has 2GB of memory (or less)...
</quote>

Comment 23 Jarod Wilson 2008-02-15 04:38:37 UTC
However, I believe the plan is to fix this, making the note unnecessary... :)

Comment 24 Jarod Wilson 2008-02-17 02:49:12 UTC
Meh. Finally got the system reserved and got a xen kernel with a test patch
installed and booted, but I can't get a guest install to go anywhere:

Starting install...
libvir: Xen error : Domain not found: xenUnifiedDomainLookupByUUID
Retrieving file Server...                                       494 kB 00:00 
Retrieving file vmlinuz.. 100% |=========================| 3.4 MB    00:00     
Retrieving file initrd.im 100% |=========================| 9.2 MB    00:01     
libvir: Xen error : Domain not found: xenUnifiedDomainLookupByName
libvir: Xen error : Domain not found: xenUnifiedDomainLookupByID
virDomainLookupByID() failed Domain not found: xenUnifiedDomainLookupByID
Domain installation may not have been
 successful.  If it was, you can restart your domain
 by running 'virsh start test'; otherwise, please
 restart your installation.
Sat, 16 Feb 2008 21:46:51 ERROR    virDomainLookupByID() failed Domain not
found: xenUnifiedDomainLookupByID
Traceback (most recent call last):
  File "/usr/sbin/virt-install", line 517, in ?
    main()
  File "/usr/sbin/virt-install", line 481, in main
    dom = guest.start_install(conscb,progresscb)
  File "/usr/lib/python2.4/site-packages/virtinst/Guest.py", line 813, in
start_install
    return self._do_install(consolecb, meter)
  File "/usr/lib/python2.4/site-packages/virtinst/Guest.py", line 829, in
_do_install
    self._create_devices(meter)
  File "/usr/lib/python2.4/site-packages/virtinst/Guest.py", line 727, in
_create_devices
    nic.setup(self.conn)
  File "/usr/lib/python2.4/site-packages/virtinst/Guest.py", line 281, in setup
    vm = conn.lookupByID(id)
  File "/usr/lib/python2.4/site-packages/libvirt.py", line 638, in lookupByID
    if ret is None:raise libvirtError('virDomainLookupByID() failed', conn=self)
libvirtError: virDomainLookupByID() failed Domain not found:
xenUnifiedDomainLookupByID

Comment 25 Jeff Burke 2008-02-17 14:59:52 UTC
Jarod,
 Using the these steps to Reproduce:
1. Using host hp-lp1.rhts.boston.redhat.com. Install RHEL5.U1
2. Install 2.6.18-76.el5 kernel-xen, reboot
3. Try installing a xen guest

 I was able to reproduce the issue every time I tried. It looks like you used 
RHEL5.2-Server-20080214.nightly baseline. Also I didn't add any additional
cmdline oprions.

Comment 27 Jarod Wilson 2008-02-18 19:10:57 UTC
Okay, so I was able to reproduce the panic on 2.6.18-81.el5xen. Trying the same
thing on a -81 kernel patched to reserve an additional 64MB of RAM for the
hypervisor, no more kernel panic. However, guest installation still fails:

virt-install spew:
------------------
Starting install...
Retrieving Server...                                            482 kB 00:00 
Retrieving vmlinuz...     100% |=========================| 3.4 MB    00:00     
Retrieving initrd.img...  100% |=========================| 8.8 MB    00:01     
Creating storage file...  100% |=========================|  10 GB    03:52     
libvir: Xen Daemon error : POST operation failed: (xend.err "Error creating
domain: (1, 'Internal error', 'launch_vm: SETVCPUCONTEXT failed (rc=-1)\\n')")
Traceback (most recent call last):
  File "/usr/sbin/virt-install", line 633, in ?
    main()
  File "/usr/sbin/virt-install", line 578, in main
    dom = guest.start_install(conscb,progresscb)
  File "/usr/lib/python2.4/site-packages/virtinst/Guest.py", line 649, in
start_install
    return self._do_install(consolecb, meter)
  File "/usr/lib/python2.4/site-packages/virtinst/Guest.py", line 666, in
_do_install
    self.domain = self.conn.createLinux(install_xml, 0)
  File "/usr/lib/python2.4/site-packages/libvirt.py", line 503, in createLinux
    if ret is None:raise libvirtError('virDomainCreateLinux() failed', conn=self)
libvirt.libvirtError: virDomainCreateLinux() failed POST operation failed:
(xend.err "Error creating domain: (1, 'Internal error', 'launch_vm:
SETVCPUCONTEXT failed (rc=-1)\\n')")



On the system console:
----------------------
(XEN) mm.c:732:d0 vcpu 1 iip 0xa000000000010620: bad mpa 0x4092000000 (=>
0x407455d000)
(XEN) mm.c:732:d0 vcpu 1 iip 0x20000000004d33c0: bad mpa 0x4092000000 (=>
0x407455d000)
(XEN) mm.c:732:d0 vcpu 1 iip 0x20000000004d33c0: bad mpa 0x4092000000 (=>
0x407455d000)
(XEN) mm.c:732:d0 vcpu 1 iip 0x20000000004d33c0: bad mpa 0x4092000000 (=>
0x407455d000)
(XEN) mm.c:732:d0 vcpu 1 iip 0x20000000004d33c0: bad mpa 0x4092000000 (=>
0x407455d000)
(XEN) mm.c:732:d0 vcpu 1 iip 0x20000000004d33c0: bad mpa 0x4092000000 (=>
0x407455d000)
(XEN) mm.c:732:d0 vcpu 1 iip 0xa000000000010620: bad mpa 0x4093000000 (=>
0x407455d000)
(XEN) mm.c:732:d0 vcpu 1 iip 0x20000000004d33c0: bad mpa 0x4093000000 (=>
0x407455d000)


Trying some additional things...

Comment 28 Jarod Wilson 2008-02-18 21:40:41 UTC
The 'bad mpa' spew goes away if I reserve a bit more memory for the hypervisor,
but I'm still hitting the SETVCPUCONTEXT failed thing, even on a -81 kernel
w/only 1GB given to dom0. I suspect this is yet another different bug, and I
believe my fix will work to prevent the original panic problem.

Comment 29 Jarod Wilson 2008-02-18 21:55:41 UTC
Minor correction... I've already verified my fix does prevent the kernel panic.
I'm pretty sure its a valid fix/workaround for the panic problem, and the guest
install failure is a separate issue.

Comment 30 Jarod Wilson 2008-02-18 22:07:52 UTC
Marking as a regression, since we didn't panic w/5.1 kernel-xen. Will post patch
for internal review shortly...

Comment 31 RHEL Program Management 2008-02-18 22:09:30 UTC
This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.

Comment 33 Jarod Wilson 2008-02-19 15:26:37 UTC
Patch posted for internal review.

Comment 34 Jarod Wilson 2008-02-19 20:42:48 UTC
Turns out this patch also remedies a problem Fujitsu was hitting with swiotlb
init causing a panic if all memory is allocated to dom0 on a 16G box... I
suspect this problem could hit just about any machine where dom0 was allocated
"all" memory...

Comment 36 Don Domingo 2008-04-02 02:14:13 UTC
Hi,
the RHEL5.2 release notes will be dropped to translation on April 15, 2008, at
which point no further additions or revisions will be entertained.

a mockup of the RHEL5.2 release notes can be viewed at the following link:
http://intranet.corp.redhat.com/ic/intranet/RHEL5u2relnotesmockup.html

please use the aforementioned link to verify if your bugzilla is already in the
release notes (if it needs to be). each item in the release notes contains a
link to its original bug; as such, you can search through the release notes by
bug number.

Cheers,
Don

Comment 37 Don Zickus 2008-04-02 16:09:11 UTC
in kernel-2.6.18-88.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 38 Jarod Wilson 2008-04-03 16:34:32 UTC
(In reply to comment #20)
> i didn't get what triggers this error, but adding the following note to RHEL5.2
> release notes ("Known Issues") for your review:
> 
> <quote>
> (ia64) If your system has less than 2GB of memory, a kernel panic may occur (on
> the host's kernel) if you attempt to create a guest. When this occurs, use the
> kernel parameter dom0_mem=1G on the hypervisor kernel before retrying.
> </quote>
> 
> please advise if any further revisions are required. thanks!

This release note should be removed entirely now. We have a patch in the 5.2
kernel-xen that remedies the problem w/o the need for the work-around.

Comment 39 Don Domingo 2008-04-04 02:57:16 UTC
thanks. removing release note entirely.

Comment 42 errata-xmlrpc 2008-05-21 15:08:35 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html



Note You need to log in before you can comment on or make changes to this bug.