Red Hat Bugzilla – Bug 204468
xen VM goes zombie
Last modified: 2007-11-30 17:11:41 EST
Description of problem:
After updating to latest versions (with "yum update") virtualisation has become
unstable. Before update the system was very stable (I don't recall it ever
The system has 2 virtual machines running:
# xm list
Name ID Mem(MiB) VCPUs State Time(s)
Domain-0 0 2268 1 r----- 54.9
Hammerkit 4 256 1 ------ 15.3
sanakirja 3 512 1 ------ 659.6
When the system has been running for couple of hours, VM "sanakirja" goes into
zombie state and VM "Hammerkit" looses network connectivity (it can still be
reached with "xm console"). Networking in Dom0 is unaffected.
The zombie VM can not be destroyed, nor is it possible to restart it. Normal
operation can be restored for a while by rebooting Dom0.
Version-Release number of selected component (if applicable):
DomU: (both VMs)
Crashes in less than 24 hours after rebooting the system.
Created attachment 135120 [details]
Output from xen-bugtool
I'd like to "vote" for this bug as well.
I've seen this one too with the 2.6.18-rc3 based XEN-enabled kernel, on x86_64
(single CPU). One domU crashed without obvious reason, network to the other two
went dead, and xend would fail to restart the domain (restarting too fast). When
logging in "xm list" told me "Domain 0 not connected". A restart of xend made me
see the two other domU's, attaching console worked. Killing them left the vif*
devices laying around, and starting new domains was impossible due to "hotplug
problems". Before the incident hotplug worked fine though. Only a reboot helped
the machine out of that state.
I've seen this on two distinct hosts, one host even showed the phenomenon three
times a day, after two weeks without problems. I looked into all logs, but
nothing special there, looks identical to the xenbug.tar.gz posted here.
On the xen-users mailing list, Adrian Chadd <email@example.com> also saw
this bug. I find random crashes and an afterwards unusable xend rather worrying.
So, I'd propose to raise the severity to high.
Just adding that I'm seeing identical behaviour with the 2.6.17-1.2187_FC5
kernel. Hardware is an Athlon XP 2200 with an RTL-8169 Gig NIC.
I'm curious if people have left their 'xm console' on for the relevant XenU's
and seen if there was a kernel panic before things went to a Zombie?
It might be that this is a duplicate of
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=199944 . Please leave
your consoles on and see if the Zombie is caused by a bug in
See if last lines say something like, "EIP: [<e10f9206>]
network_tx_buf_gc+0xc4/0x1b7 [xennet] SS:ESP 0069:c0651edc
<0>Kernel panic - not syncing: Fatal exception in interrupt"
Yes, that seems to be the bug. The description matches exactly (I haven't had a
console open though). The problems always started on an Apache machine, while
the mail server machines worked flawlessly. Since the update to a -rc6 based
kernel from the development tree, I didn't see the crash any more. I have seen
the Zombie thing though after aggressively destroying all DomU's at once (which
needed a reboot). I think someone should also look into why it's possible for a
DomU to take the whole networking down and for Xen to get into an inconsistent
Created attachment 136944 [details]
Console output from other domain when one went to zombie
Created attachment 137006 [details]
Console output from crashing domain
Created attachment 137145 [details]
Console output from crashing domain, 2dn time
Created attachment 137153 [details]
Console output from crashing domain, 3rd time
Is there any plans to fix this, by for example reverting back to older version
of is the FC5 & Xen combination doomed? Currently it's not even usable for
testing as one domU will crash the whole system so that 24 hour uptime is a miracle.
kernel-2.6.18-1.2189.fc5 is available in testing but the changelog reports that
some new xen userspace tools are needed which will be available shortly.
*** This bug has been marked as a duplicate of 199944 ***