Red Hat Bugzilla – Bug 174192
kernel-xen-hypervisor unstable (lockups) on HP Proliant G3 (single CPU, 1G RAM)
Last modified: 2007-11-30 17:11:17 EST
Description of problem:
Xen (both just running the hypervisor kernel, and also running guests as well)
seems unstable on the HP Proliant DL380 G3. Regular non-xen kernels are fine.
Version-Release number of selected component (if applicable):
Lock up usually occurs within 10 minutes of boot.
I've got this stack trace once:
login: kernel BUG at:arch/xen/i386/mm/hypervisor.c:381 (xen_create_contig
uous_region)! s es:
Two other lock ups (no network response, console dead), and once where box
responded to pings but TCP connections hung, and the console was wedged (this
was before I disabled smp).
title Fedora Core (2.6.12-1.13_FC5.small_thypervisor)
kernel /boot/xen.gz-2.6.12-1.13_FC5.small_t com1=115200,8n1,0x408,4
module /boot/vmlinuz-2.6.12-1.13_FC5.small_thypervisor ro root=LABEL=/
The kernel is as the SRPMS from
except that the kernel config has been modified to build the cciss driver into
the kernel (instead of as a module as was the stock config) - this was done as
part of some earlier debugging.
$ uname -a ; cat /proc/meminfo ; cat /proc/cpuinfo ; lspci ; cat /proc/devices
Linux xyz.xyz 2.6.12-1.13_FC5.xyzhypervisor #1 SMP Thu Nov 24 15:31:00 GMT 2005
i686 i686 i386 GNU/Linux
MemTotal: 393216 kB
MemFree: 308632 kB
Buffers: 6116 kB
Cached: 36784 kB
SwapCached: 0 kB
Active: 30832 kB
Inactive: 23184 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 393216 kB
LowFree: 308632 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 12 kB
Writeback: 0 kB
Mapped: 18512 kB
Slab: 9092 kB
CommitLimit: 196608 kB
Committed_AS: 90792 kB
PageTables: 860 kB
VmallocTotal: 121752 kB
VmallocUsed: 2064 kB
VmallocChunk: 119252 kB
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Xeon(TM) CPU 2.80GHz
stepping : 7
cpu MHz : 2785.128
cache size : 512 KB
fdiv_bug : no
hlt_bug : yes
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 clflush
dts acpi mmx fxsr sse sse2 ss ht tm pbe cid xtpr
bogomips : 5557.45
00:00.0 Host bridge: Broadcom (formerly ServerWorks) CMIC-LE Host Bridge (GC-LE
chipset) (rev 33)
00:00.1 Host bridge: Broadcom (formerly ServerWorks) CMIC-LE Host Bridge (GC-LE
00:00.2 Host bridge: Broadcom (formerly ServerWorks) CMIC-LE Host Bridge (GC-LE
00:03.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
00:04.0 System peripheral: Compaq Computer Corporation Integrated Lights Out
Controller (rev 01)
00:04.2 System peripheral: Compaq Computer Corporation Integrated Lights Out
Processor (rev 01)
00:0f.0 ISA bridge: Broadcom (formerly ServerWorks) CSB5 South Bridge (rev 93)
00:0f.1 IDE interface: Broadcom (formerly ServerWorks) CSB5 IDE Controller (rev 93)
00:0f.2 USB Controller: Broadcom (formerly ServerWorks) OSB4/CSB5 OHCI USB
Controller (rev 05)
00:0f.3 Host bridge: Broadcom (formerly ServerWorks) CSB5 LPC bridge
00:10.0 Host bridge: Broadcom (formerly ServerWorks) CIOB-X2 PCI-X I/O Bridge
00:10.2 Host bridge: Broadcom (formerly ServerWorks) CIOB-X2 PCI-X I/O Bridge
00:11.0 Host bridge: Broadcom (formerly ServerWorks) CIOB-X2 PCI-X I/O Bridge
00:11.2 Host bridge: Broadcom (formerly ServerWorks) CIOB-X2 PCI-X I/O Bridge
01:03.0 RAID bus controller: Compaq Computer Corporation Smart Array 5i/532 (rev 01)
02:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5703X Gigabit
Ethernet (rev 02)
02:02.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5703X Gigabit
Ethernet (rev 02)
03:01.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X
Fusion-MPT Dual Ultra320 SCSI (rev 07)
03:01.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X
Fusion-MPT Dual Ultra320 SCSI (rev 07)
06:02.0 USB Controller: NEC Corporation USB (rev 43)
06:02.1 USB Controller: NEC Corporation USB (rev 43)
06:02.2 USB Controller: NEC Corporation USB 2.0 (rev 04)
06:1e.0 PCI Hot-plug controller: Compaq Computer Corporation PCI Hotplug
Controller (rev 14)
I have rebuilt, with this patch:
and a UP kernel, and I'm currently testing it..
The new UP kernel doesn't seem to have helped much. I've not seen the original
stack trace recurr, but domain0 on the machine locks up, usually within 15
minutes of boot up. Xen itself is still running, I think.
If I can usefully collect any more debug info for this machine, it'll need to be
done today, if possible, as I won't have access to the box for the next 2 weeks.
The latest Xen kernels include a couple of fixes regarding swiotlb enabling,
which should result in the provision of a small swiotlb on all dom0s. Xen
should be able to use this as a fallback for cases where it can't generate
physically contiguous dma map requests. Can you see if the problem still
I haven't seen I still get lockups (maximum uptime 15 minutes, on an idle
system) on these machines, using:
It looks like the hypervisor is locking up, as I can't seem to get anything out
of it on the serial console (although I'm limited to getting serial console I/O
using HP '"Intelligent" lights out' so I don't have 100% confidence in this).
I'm booting with:
kernel /boot/xen.gz-2.6.15-1.29_FC5.xxx com1=115200,8n1,0x408,4 nosmp watchdog debug
but haven't seen any more debug traces since the original posting (it may be
that the two issues are unrelated). Any further suggestions to get more debug
info would be welcome, although I have limited time to get more info from this
machine, as I'm under pressure to put it back in service for user sessions (i.e.
try using Xen2, or dump Xen altogether).
Yes, a hard lockup definitely does not sound like the oops you were generating
initially --- sounds like a new problem.
I've just pushed a rebased, more recent linux-2.6-merge.hg and hypervisor to
rawhide, and that should show up tomorrow as
kernel-xen-hypervisor-2.6.15-1.33_FC5. Does that show anything different?
We don't currently build the hypervisor with the debug options enabled ---
that's something I'm likely to turn on soon, which may help here.
and good results so far - uptime of ~4 hours, and a couple of guest created. I
haven't done any serious stress testing yet, but so far, look good!
Thanks! Might be an idea to close this bug, and I'll reopen if I get any
BTW, previous version tested (which exhibited this bug) was
If you would like me to try and nail this bug to a particular change, please let
me know, and I'll see what I can do.
OK. I've got reasonable confidence that the networking contiguous-region bug
you originally reported should now be fixed, so I'll go ahead and close this.
If you do get further occurrences of the hang you saw later on, it would
probably be best to open a separate bug for that to distinguish it from the
original networking bug.
Thanks for the testing!