Bug 204468

Summary: xen VM goes zombie
Product: [Fedora] Fedora Reporter: Jussi Siponen <jussi.siponen>
Component: xenAssignee: Herbert Xu <herbert.xu>
Status: CLOSED DUPLICATE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 5CC: bstein, christophe, katzj, kwan, russell, ville.lindfors
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-09-27 06:35:12 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Description Flags
Output from xen-bugtool
Console output from other domain when one went to zombie
Console output from crashing domain
Console output from crashing domain, 2dn time
Console output from crashing domain, 3rd time none

Description Jussi Siponen 2006-08-29 08:32:13 EDT
Description of problem:

After updating to latest versions (with "yum update") virtualisation has become
unstable. Before update the system was very stable (I don't recall it ever

The system has 2 virtual machines running:

# xm list
Name                              ID Mem(MiB) VCPUs State  Time(s)
Domain-0                           0     2268     1 r-----    54.9
Hammerkit                          4      256     1 ------    15.3
sanakirja                          3      512     1 ------   659.6

When the system has been running for couple of hours, VM "sanakirja" goes into
zombie state and VM "Hammerkit" looses network connectivity (it can still be
reached with "xm console"). Networking in Dom0 is unaffected.

The zombie VM can not be destroyed, nor is it possible to restart it. Normal
operation can be restored for a while by rebooting Dom0.

Version-Release number of selected component (if applicable):


DomU: (both VMs)

How reproducible:

Crashes in less than 24 hours after rebooting the system.
Comment 1 Jussi Siponen 2006-08-29 08:32:13 EDT
Created attachment 135120 [details]
Output from xen-bugtool
Comment 2 Christophe Saout 2006-09-19 17:37:28 EDT
I'd like to "vote" for this bug as well.

I've seen this one too with the 2.6.18-rc3 based XEN-enabled kernel, on x86_64
(single CPU). One domU crashed without obvious reason, network to the other two
went dead, and xend would fail to restart the domain (restarting too fast). When
logging in "xm list" told me "Domain 0 not connected". A restart of xend made me
see the two other domU's, attaching console worked. Killing them left the vif*
devices laying around, and starting new domains was impossible due to "hotplug
problems". Before the incident hotplug worked fine though. Only a reboot helped
the machine out of that state.

I've seen this on two distinct hosts, one host even showed the phenomenon three
times a day, after two weeks without problems. I looked into all logs, but
nothing special there, looks identical to the xenbug.tar.gz posted here.

On the xen-users mailing list, Adrian Chadd <adrian@creative.net.au> also saw
this bug. I find random crashes and an afterwards unusable xend rather worrying.

So, I'd propose to raise the severity to high.
Comment 3 Kwan Lowe 2006-09-20 21:15:05 EDT
Just adding that I'm seeing identical behaviour with the 2.6.17-1.2187_FC5
kernel. Hardware is an Athlon XP 2200 with an RTL-8169 Gig NIC.   
Comment 4 Russell McOrmond 2006-09-21 11:21:50 EDT
I'm curious if people have left their 'xm console' on for the relevant XenU's
and seen if there was a kernel panic before things went to a Zombie?

It might be that this is a duplicate of
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=199944  .  Please leave
your consoles on and see if the Zombie is caused by a bug in 

See if last lines say something like, "EIP: [<e10f9206>]
network_tx_buf_gc+0xc4/0x1b7 [xennet] SS:ESP 0069:c0651edc
 <0>Kernel panic - not syncing: Fatal exception in interrupt"

Comment 5 Christophe Saout 2006-09-21 12:26:22 EDT
Yes, that seems to be the bug. The description matches exactly (I haven't had a
console open though). The problems always started on an Apache machine, while
the mail server machines worked flawlessly. Since the update to a -rc6 based
kernel from the development tree, I didn't see the crash any more. I have seen
the Zombie thing though after aggressively destroying all DomU's at once (which
needed a reboot). I think someone should also look into why it's possible for a
DomU to take the whole networking down and for Xen to get into an inconsistent
Comment 6 Ville Lindfors 2006-09-22 10:39:13 EDT
Created attachment 136944 [details]
Console output from other domain when one went to zombie
Comment 7 Ville Lindfors 2006-09-23 15:35:15 EDT
Created attachment 137006 [details]
Console output from crashing domain
Comment 8 Ville Lindfors 2006-09-26 10:49:06 EDT
Created attachment 137145 [details]
Console output from crashing domain, 2dn time
Comment 9 Ville Lindfors 2006-09-26 13:13:08 EDT
Created attachment 137153 [details]
Console output from crashing domain, 3rd time
Comment 10 Ville Lindfors 2006-09-26 13:41:24 EDT
Is there any plans to fix this, by for example reverting back to older version
of is the FC5 & Xen combination doomed? Currently it's not even usable for
testing as one domU will crash the whole system so that 24 hour uptime is a miracle.
Comment 11 Kwan Lowe 2006-09-26 13:52:32 EDT
kernel-2.6.18-1.2189.fc5 is available in testing but the changelog reports that
some new xen userspace tools are needed which will be available shortly.
Comment 12 Herbert Xu 2006-09-27 06:35:12 EDT

*** This bug has been marked as a duplicate of 199944 ***