Description of problem: I'm running a variety of domU's on my Dell x86_64 workstation (rhel4u5, fc6, fc7, etc). I have been playing with some of the domU config files and I've noticed the following. If I start one of the domUs and it crashes on boot, I can get into a state where "xm list" only produces the following error output: # xm list Error: Boot loader didn't return any data! Usage: xm list [options] [Domain, ...] List information about all/some domains. -l, --long Output all VM details in SXP --label Include security labels The normal way I'm starting a domU is something like this: screen -t "fc6" -d -m -S fc6 xm create -c fc6 Once I get into this state, I cannot monitor my other domains (even though I can verify they are still running by switching to their 'screen' console), start new domains, etc. Version-Release number of selected component (if applicable): # rpm -qa | grep xen xen-libs-3.0.3-25.0.3.el5 kernel-xen-2.6.18-8.1.6.el5 kernel-xen-2.6.18-8.el5 xen-devel-3.0.3-25.0.3.el5 xen-devel-3.0.3-25.0.3.el5 kernel-xen-devel-2.6.18-8.1.8.el5 xen-libs-3.0.3-25.0.3.el5 kernel-xen-devel-2.6.18-8.el5 xen-3.0.3-25.0.3.el5 kernel-xen-2.6.18-8.1.8.el5 kernel-xen-devel-2.6.18-8.1.6.el5 How reproducible: Fairly reproducible. I might be able to come up with a test sequence to get it into this state but have not tried. Steps to Reproduce: 1. Start a domU (successfully). In my latest case, I had 2 rhel4u5 domU's and a FC7 domU running. 2. Attempt to start another domU but have it crash. This last time I specified an incorrect "root=" line in the xen config file. I was trying to start a FC6 domU. 3. Note that "xm list" continually produces errors. Actual results: "xm" commands produce errors. Expected results: "xm" commands should still work, at least for the other domains that are still running.
Please provide /var/log/xen/xend.log and the guest config file
Created attachment 159678 [details] xend log file
Created attachment 159679 [details] fc6 xen config file The "root=" line was commented out when it crashed.
Ok, I think I know what's going on here. - The domain crashed at some point. - The domain config file has 'on_crash': 'restart' - The domain status is updated every time you run 'xm list' - So everytime you run xm list it notices a crashed domain, tries to restart it, fails, and aborts the entire xm list command. Clearly what it should be doing is - Try to restart it once. If that fails, destroy it.
Re comment#4: Should on_crash: restart really destroy the domain (like on_crash: destroy) when restarting fails, or should it preserve it (like on_crash: preserve)?
Interesting. As a workaround, I tried setting on_crash to destroy in the config file and then running "xm list", but I'm still getting the same error. I'm wondering if there's a workaround from here or if I have to reboot the machine to get back to a usable "xm list". Thanks.
Ran into this problem again, with a slightly different scenario. This time, I was playing around with using domU's via a mounted USB hd. The USB hd had ntfs on it, and I was using fuse to mount it. Everything _seemed_ to be fine with the mount point and the drive. I copied my domU files from my local HD to this USB drive mount point, then edited my xen config file to look like this: # Automatically generated xen config file name = "rhel4u5-node3" memory = "384" disk = [ 'tap:aio:/mnt/sdb1/xen/images/rhel4u5-node3,xvda,w', ] #disk = [ 'tap:aio:/mnt/xen-images/rhel4u5-node3,xvda,w', ] vif = [ 'mac=00:16:3E:40:A9:83, bridge=br1', ] #vfb = ["type=vnc,vncunused=1"] uuid = "b39cc0c0-70e2-d085-889a-deadbeef0de3" bootloader="/usr/bin/pygrub" vcpus=2 on_reboot = 'restart' on_crash = 'destroy' The domain started to boot, but hung at the "Reading all physical volumes." line (from LVM scanning actually). Here's the snippit of the console: rtc: IRQ 8 is not free. i8042.c: No controller found. rtc: IRQ 8 is not free. i8042.c: No controller found. Red Hat nash version 4.2.1.10 starting Reading all physical volumes. This may take a while... In another window I then did a "xm shutdown <dom#>" and now have the broken xm list again.
Any luck to reproduce it Dave? Michal
I am unable to reproduce on later RPMs. I attempted to reproduce this with a configuration similar to my original report - an invalid "root=" line and "on_crash = 'restart'". I did see "xm list" fail repeatedly for a period of time, but now it is fine. The RPMs on my machine now are: # rpm -qa | grep xen | sort kernel-xen-2.6.18-164.11.1.el5.x86_64 kernel-xen-2.6.18-164.2.1.el5.x86_64 kernel-xen-2.6.18-164.el5.x86_64 kernel-xen-devel-2.6.18-164.11.1.el5.x86_64 kernel-xen-devel-2.6.18-164.2.1.el5.x86_64 kernel-xen-devel-2.6.18-164.el5.x86_64 xen-3.0.3-94.el5_4.3.x86_64 xen-libs-3.0.3-94.el5_4.3.i386 xen-libs-3.0.3-94.el5_4.3.x86_64 # uname -a Linux dhcp231-162.rdu.redhat.com 2.6.18-164.11.1.el5xen #1 SMP Wed Jan 6 13:43:3 # cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.4 (Tikanga)
Ok, if it's not reproducible using -94 version of xen package then closing this as CURRENTRELEASE. Michal