Bug 249074 - "xm list" no longer works after domU attempted to start but crashed
"xm list" no longer works after domU attempted to start but crashed
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xen (Show other bugs)
5.0
x86_64 Linux
medium Severity medium
: ---
: 5.6
Assigned To: Michal Novotny
Virtualization Bugs
:
Depends On:
Blocks: 514500
  Show dependency treegraph
 
Reported: 2007-07-20 14:58 EDT by Dave Wysochanski
Modified: 2014-02-02 17:36 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-07-22 03:32:29 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
xend log file (318.65 KB, text/plain)
2007-07-20 15:16 EDT, Dave Wysochanski
no flags Details
fc6 xen config file (389 bytes, text/plain)
2007-07-20 15:17 EDT, Dave Wysochanski
no flags Details

  None (edit)
Description Dave Wysochanski 2007-07-20 14:58:27 EDT
Description of problem:
I'm running a variety of domU's on my Dell x86_64 workstation (rhel4u5, fc6,
fc7, etc).  I have been playing with some of the domU config files and I've
noticed the following.  If I start one of the domUs and it crashes on boot, I
can get into a state where "xm list" only produces the following error output:
# xm list
Error: Boot loader didn't return any data!
Usage: xm list [options] [Domain, ...]

List information about all/some domains.
  -l, --long                     Output all VM details in SXP               
  --label                        Include security labels                    


The normal way I'm starting a domU is something like this:
screen -t "fc6" -d -m -S fc6 xm create -c fc6

Once I get into this state, I cannot monitor my other domains (even though I can
verify they are still running by switching to their 'screen' console), start new
domains, etc.



Version-Release number of selected component (if applicable):
# rpm -qa | grep xen
xen-libs-3.0.3-25.0.3.el5
kernel-xen-2.6.18-8.1.6.el5
kernel-xen-2.6.18-8.el5
xen-devel-3.0.3-25.0.3.el5
xen-devel-3.0.3-25.0.3.el5
kernel-xen-devel-2.6.18-8.1.8.el5
xen-libs-3.0.3-25.0.3.el5
kernel-xen-devel-2.6.18-8.el5
xen-3.0.3-25.0.3.el5
kernel-xen-2.6.18-8.1.8.el5
kernel-xen-devel-2.6.18-8.1.6.el5


How reproducible:
Fairly reproducible.  I might be able to come up with a test sequence to get it
into this state but have not tried.


Steps to Reproduce:
1. Start a domU (successfully).  In my latest case, I had 2 rhel4u5 domU's and a
FC7 domU running.
2. Attempt to start another domU but have it crash.  This last time I specified
an incorrect "root=" line in the xen config file.  I was trying to start a FC6 domU.
3. Note that "xm list" continually produces errors.
  
Actual results:
"xm" commands produce errors.

Expected results:
"xm" commands should still work, at least for the other domains that are still
running.
Comment 1 Daniel Berrange 2007-07-20 15:02:21 EDT
Please provide /var/log/xen/xend.log and the guest config file
Comment 2 Dave Wysochanski 2007-07-20 15:16:18 EDT
Created attachment 159678 [details]
xend log file
Comment 3 Dave Wysochanski 2007-07-20 15:17:20 EDT
Created attachment 159679 [details]
fc6 xen config file

The "root=" line was commented out when it crashed.
Comment 4 Daniel Berrange 2007-07-20 20:14:15 EDT
Ok, I think I know what's going on here.

 - The domain crashed at some point.
 - The domain config file has     'on_crash': 'restart'
 - The domain status is updated every time you run 'xm list'
 - So everytime you run xm list it notices a crashed domain, tries to restart
it, fails, and aborts the entire xm list command. 

Clearly what it should be doing is

 - Try to restart it once. If that fails, destroy it.
Comment 5 Markus Armbruster 2007-07-21 00:54:30 EDT
Re comment#4: Should on_crash: restart really destroy the domain (like on_crash:
destroy) when restarting fails, or should it preserve it (like on_crash: preserve)?
Comment 6 Dave Wysochanski 2007-07-21 19:52:53 EDT
Interesting.  As a workaround, I tried setting on_crash to destroy in the config
file and then running "xm list", but I'm still getting the same error.

I'm wondering if there's a workaround from here or if I have to reboot the
machine to get back to a usable "xm list".

Thanks.
Comment 7 Dave Wysochanski 2007-08-10 13:30:28 EDT
Ran into this problem again, with a slightly different scenario.

This time, I was playing around with using domU's via a mounted USB hd.  The USB
hd had ntfs on it, and I was using fuse to mount it.  Everything _seemed_ to be
fine with the mount point and the drive.  I copied my domU files from my local
HD to this USB drive mount point, then edited my xen config file to look like this:
# Automatically generated xen config file
name = "rhel4u5-node3"
memory = "384"
disk = [ 'tap:aio:/mnt/sdb1/xen/images/rhel4u5-node3,xvda,w', ]
#disk = [ 'tap:aio:/mnt/xen-images/rhel4u5-node3,xvda,w', ]
vif = [ 'mac=00:16:3E:40:A9:83, bridge=br1', ]
#vfb = ["type=vnc,vncunused=1"]
uuid = "b39cc0c0-70e2-d085-889a-deadbeef0de3"
bootloader="/usr/bin/pygrub"
vcpus=2
on_reboot   = 'restart'
on_crash    = 'destroy'


The domain started to boot, but hung at the "Reading all physical volumes." line
(from LVM scanning actually).  Here's the snippit of the console:
rtc: IRQ 8 is not free.
i8042.c: No controller found.
rtc: IRQ 8 is not free.
i8042.c: No controller found.
Red Hat nash version 4.2.1.10 starting
  Reading all physical volumes.  This may take a while...


In another window I then did a "xm shutdown <dom#>" and now have the broken xm
list again.
Comment 13 Michal Novotny 2010-06-14 05:44:00 EDT
Any luck to reproduce it Dave?

Michal
Comment 14 Dave Wysochanski 2010-07-21 13:53:31 EDT
I am unable to reproduce on later RPMs.

I attempted to reproduce this with a configuration similar to my original report - an invalid "root=" line and "on_crash = 'restart'".  I did see "xm list" fail repeatedly for a period of time, but now it is fine.

The RPMs on my machine now are:
# rpm -qa | grep xen | sort
kernel-xen-2.6.18-164.11.1.el5.x86_64
kernel-xen-2.6.18-164.2.1.el5.x86_64
kernel-xen-2.6.18-164.el5.x86_64
kernel-xen-devel-2.6.18-164.11.1.el5.x86_64
kernel-xen-devel-2.6.18-164.2.1.el5.x86_64
kernel-xen-devel-2.6.18-164.el5.x86_64
xen-3.0.3-94.el5_4.3.x86_64
xen-libs-3.0.3-94.el5_4.3.i386
xen-libs-3.0.3-94.el5_4.3.x86_64

# uname -a
Linux dhcp231-162.rdu.redhat.com 2.6.18-164.11.1.el5xen #1 SMP Wed Jan 6 13:43:3

# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 5.4 (Tikanga)
Comment 15 Michal Novotny 2010-07-22 03:32:29 EDT
Ok, if it's not reproducible using -94 version of xen package then closing this as CURRENTRELEASE.

Michal

Note You need to log in before you can comment on or make changes to this bug.