Bugzilla will be upgraded to version 5.0 on December 2, 2018. The outage period for the upgrade will start at 0:00 UTC and have a duration of 12 hours
Bug 228511 - xen domain auto startup does not work reliable
xen domain auto startup does not work reliable
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xen (Show other bugs)
i386 Linux
medium Severity high
: ---
: ---
Assigned To: Xen Maintainance List
Depends On:
Blocks: 492190
  Show dependency treegraph
Reported: 2007-02-13 09:54 EST by Markus Kremer
Modified: 2009-05-01 16:26 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2009-04-22 06:34:44 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
tgz logs and config (42.75 KB, application/octet-stream)
2007-03-28 03:46 EDT, Markus Kremer
no flags Details

  None (edit)
Description Markus Kremer 2007-02-13 09:54:43 EST
Description of problem:
When creating many domains and doing reboots, not all domains are started.
message from /var/log/xen/xend.log
[2007-02-13 08:41:29 xend 2209] INFO (image:138) buildDomain os=linux dom=7 vcpus=1
[2007-02-13 08:41:35 xend 2209] INFO (image:214) configuring linux guest
[2007-02-13 08:41:37 xend 2209] INFO (image:138) buildDomain os=linux dom=8 vcpus=1
[2007-02-13 08:41:39 xend 2209] INFO (XendDomain:370) Domain vm_5 (8) unpaused.
[2007-02-13 08:41:39 xend.XendDomainInfo 2209] WARNING (XendDomainInfo:875)
Domain has crashed: name=vm_4 id=7.
[2007-02-13 08:41:40 xend.XendDomainInfo 2209] ERROR (XendDomainInfo:1661) VM
vm_4 restarting too fast (13.252752 seconds since the last restart).  Refusing
to restart to avoid loops.

Version-Release number of selected component (if applicable):
Version=5 beta 2
CPU=Intel(R) Pentium(R) 4 CPU 3.00GHz  (no HT/SMP enabled)

How reproducible:
everytime some VMs are missing.

Steps to Reproduce: 
- ks install server using base + @virtualisation packages 
- ks install 9 guests using virt-install
- sed -ie 's/XENDOMAINS_SAVE=.*/XENDOMAINS_SAVE=/' /etc/sysconfig/xendomains  #
this does a shutdown instead of suspend
- ln -s /etc/xen/MY_VMS_* /etc/xen/auto
- do reboot
- after reboot verify that all VMs are started

Actual results:

I did 13 reboots. This is how often each VM came up automatically. 
vm_1 13
vm_2 13
vm_3 12
vm_4 11
vm_5 9
vm_6 6
vm_7 5
vm_8 6
vm_9 6
So vm_7 only started 5 times at 13 tries.

Expected results:
all machines are started at every reboot

Additional info:
fc6 with 2.6.19 kernel has similar behaviour.
The "Domain has crashed:" entries
Comment 1 Daniel Berrange 2007-03-27 11:39:03 EDT
Hmm, this is a little worrying - if it can't deal with multiple VMs starting in
very quick succession it sounds like there is some race condition/scalability
issue hiding in either HV or the XenD stack.

Can you reproduce this again & capture the output of 'xm dmesg' once booting has
completed - this will hopefull show if there are any hypervisor issues being
reported. Also can you attach the full /var/log/xen/xend.log,
/var/log/xen/xend-debug.log, /var/log/xen/xen-hotplug.log and finally if any are
HVM guests, also the qemu-dm-*.log files

Finally, can you attach the /etc/xen config file for at least one of the guests
- if they are all basically the same config one is sufficient - if every VM is
different upload a representative set.
Comment 3 Markus Kremer 2007-03-28 03:46:31 EDT
Created attachment 151099 [details]
tgz logs and config

I am using only RHEL5 xen-guests, no HVM. (see first post)

[root@rhrc1s1 x]# crontab -l
01,31 * * * * /usr/sbin/xm list| logger -t XEN1
14,44 * * * * /usr/sbin/xm list| logger -t XEN2
15,45 * * * * /sbin/reboot

[root@rhrc1s1 x]# uname -a
Linux rhrc1s1 2.6.18-8.el5xen #1 SMP Fri Jan 26 14:42:21 EST 2007 i686 i686
i386 GNU/Linux
xm dmesg >var/log/xen/xm.dmesg.out
dmesg >var/log/xen/dmesg.out
The tgz file contains
Comment 4 Chris Lalancette 2008-03-27 01:09:56 EDT
We did some work in 5.1 to make this less likely to happen, but I'm not sure if
it is completely fixed.  Is this still a problem?

Chris Lalancette
Comment 5 Michal Novotny 2009-04-15 06:26:37 EDT
Well, I have tried it using my SRPMS that can be found at http://people.redhat.com/minovotn/xen and I found no problem, I booted 9 domains total and all the domains booted correctly when testing on my box. The configuration was 4 PV and 5 FV machines...
Comment 7 Markus Kremer 2009-04-20 12:06:49 EDT
I am unable to reproduce the problem with RH 5.3 after setting dom0_mem=512M in grub.conf. My tests ran 20 256 RH5.3 64 bit udoms.
Please set the state to fixed.
Comment 8 Chris Lalancette 2009-04-22 06:34:44 EDT
OK, thanks for the testing!  Will close as FIXED.

Chris Lalancette

Note You need to log in before you can comment on or make changes to this bug.