Bug 249074 - "xm list" no longer works after domU attempted to start but crashed
Summary: "xm list" no longer works after domU attempted to start but crashed
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xen
Version: 5.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: 5.6
Assignee: Michal Novotny
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 514500
TreeView+ depends on / blocked
 
Reported: 2007-07-20 18:58 UTC by Dave Wysochanski
Modified: 2014-02-02 22:36 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-07-22 07:32:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
xend log file (318.65 KB, text/plain)
2007-07-20 19:16 UTC, Dave Wysochanski
no flags Details
fc6 xen config file (389 bytes, text/plain)
2007-07-20 19:17 UTC, Dave Wysochanski
no flags Details

Description Dave Wysochanski 2007-07-20 18:58:27 UTC
Description of problem:
I'm running a variety of domU's on my Dell x86_64 workstation (rhel4u5, fc6,
fc7, etc).  I have been playing with some of the domU config files and I've
noticed the following.  If I start one of the domUs and it crashes on boot, I
can get into a state where "xm list" only produces the following error output:
# xm list
Error: Boot loader didn't return any data!
Usage: xm list [options] [Domain, ...]

List information about all/some domains.
  -l, --long                     Output all VM details in SXP               
  --label                        Include security labels                    


The normal way I'm starting a domU is something like this:
screen -t "fc6" -d -m -S fc6 xm create -c fc6

Once I get into this state, I cannot monitor my other domains (even though I can
verify they are still running by switching to their 'screen' console), start new
domains, etc.



Version-Release number of selected component (if applicable):
# rpm -qa | grep xen
xen-libs-3.0.3-25.0.3.el5
kernel-xen-2.6.18-8.1.6.el5
kernel-xen-2.6.18-8.el5
xen-devel-3.0.3-25.0.3.el5
xen-devel-3.0.3-25.0.3.el5
kernel-xen-devel-2.6.18-8.1.8.el5
xen-libs-3.0.3-25.0.3.el5
kernel-xen-devel-2.6.18-8.el5
xen-3.0.3-25.0.3.el5
kernel-xen-2.6.18-8.1.8.el5
kernel-xen-devel-2.6.18-8.1.6.el5


How reproducible:
Fairly reproducible.  I might be able to come up with a test sequence to get it
into this state but have not tried.


Steps to Reproduce:
1. Start a domU (successfully).  In my latest case, I had 2 rhel4u5 domU's and a
FC7 domU running.
2. Attempt to start another domU but have it crash.  This last time I specified
an incorrect "root=" line in the xen config file.  I was trying to start a FC6 domU.
3. Note that "xm list" continually produces errors.
  
Actual results:
"xm" commands produce errors.

Expected results:
"xm" commands should still work, at least for the other domains that are still
running.

Comment 1 Daniel Berrangé 2007-07-20 19:02:21 UTC
Please provide /var/log/xen/xend.log and the guest config file


Comment 2 Dave Wysochanski 2007-07-20 19:16:18 UTC
Created attachment 159678 [details]
xend log file

Comment 3 Dave Wysochanski 2007-07-20 19:17:20 UTC
Created attachment 159679 [details]
fc6 xen config file

The "root=" line was commented out when it crashed.

Comment 4 Daniel Berrangé 2007-07-21 00:14:15 UTC
Ok, I think I know what's going on here.

 - The domain crashed at some point.
 - The domain config file has     'on_crash': 'restart'
 - The domain status is updated every time you run 'xm list'
 - So everytime you run xm list it notices a crashed domain, tries to restart
it, fails, and aborts the entire xm list command. 

Clearly what it should be doing is

 - Try to restart it once. If that fails, destroy it.


Comment 5 Markus Armbruster 2007-07-21 04:54:30 UTC
Re comment#4: Should on_crash: restart really destroy the domain (like on_crash:
destroy) when restarting fails, or should it preserve it (like on_crash: preserve)?


Comment 6 Dave Wysochanski 2007-07-21 23:52:53 UTC
Interesting.  As a workaround, I tried setting on_crash to destroy in the config
file and then running "xm list", but I'm still getting the same error.

I'm wondering if there's a workaround from here or if I have to reboot the
machine to get back to a usable "xm list".

Thanks.

Comment 7 Dave Wysochanski 2007-08-10 17:30:28 UTC
Ran into this problem again, with a slightly different scenario.

This time, I was playing around with using domU's via a mounted USB hd.  The USB
hd had ntfs on it, and I was using fuse to mount it.  Everything _seemed_ to be
fine with the mount point and the drive.  I copied my domU files from my local
HD to this USB drive mount point, then edited my xen config file to look like this:
# Automatically generated xen config file
name = "rhel4u5-node3"
memory = "384"
disk = [ 'tap:aio:/mnt/sdb1/xen/images/rhel4u5-node3,xvda,w', ]
#disk = [ 'tap:aio:/mnt/xen-images/rhel4u5-node3,xvda,w', ]
vif = [ 'mac=00:16:3E:40:A9:83, bridge=br1', ]
#vfb = ["type=vnc,vncunused=1"]
uuid = "b39cc0c0-70e2-d085-889a-deadbeef0de3"
bootloader="/usr/bin/pygrub"
vcpus=2
on_reboot   = 'restart'
on_crash    = 'destroy'


The domain started to boot, but hung at the "Reading all physical volumes." line
(from LVM scanning actually).  Here's the snippit of the console:
rtc: IRQ 8 is not free.
i8042.c: No controller found.
rtc: IRQ 8 is not free.
i8042.c: No controller found.
Red Hat nash version 4.2.1.10 starting
  Reading all physical volumes.  This may take a while...


In another window I then did a "xm shutdown <dom#>" and now have the broken xm
list again.

Comment 13 Michal Novotny 2010-06-14 09:44:00 UTC
Any luck to reproduce it Dave?

Michal

Comment 14 Dave Wysochanski 2010-07-21 17:53:31 UTC
I am unable to reproduce on later RPMs.

I attempted to reproduce this with a configuration similar to my original report - an invalid "root=" line and "on_crash = 'restart'".  I did see "xm list" fail repeatedly for a period of time, but now it is fine.

The RPMs on my machine now are:
# rpm -qa | grep xen | sort
kernel-xen-2.6.18-164.11.1.el5.x86_64
kernel-xen-2.6.18-164.2.1.el5.x86_64
kernel-xen-2.6.18-164.el5.x86_64
kernel-xen-devel-2.6.18-164.11.1.el5.x86_64
kernel-xen-devel-2.6.18-164.2.1.el5.x86_64
kernel-xen-devel-2.6.18-164.el5.x86_64
xen-3.0.3-94.el5_4.3.x86_64
xen-libs-3.0.3-94.el5_4.3.i386
xen-libs-3.0.3-94.el5_4.3.x86_64

# uname -a
Linux dhcp231-162.rdu.redhat.com 2.6.18-164.11.1.el5xen #1 SMP Wed Jan 6 13:43:3

# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 5.4 (Tikanga)

Comment 15 Michal Novotny 2010-07-22 07:32:29 UTC
Ok, if it's not reproducible using -94 version of xen package then closing this as CURRENTRELEASE.

Michal


Note You need to log in before you can comment on or make changes to this bug.