Bug 228511

Summary: xen domain auto startup does not work reliable
Product: Red Hat Enterprise Linux 5 Reporter: Markus Kremer <mkremer>
Component: xenAssignee: Xen Maintainance List <xen-maint>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 5.0CC: clalance, minovotn, syeghiay
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-04-22 10:34:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 492190    
Attachments:
Description Flags
tgz logs and config none

Description Markus Kremer 2007-02-13 14:54:43 UTC
Description of problem:
When creating many domains and doing reboots, not all domains are started.
 
message from /var/log/xen/xend.log
[2007-02-13 08:41:29 xend 2209] INFO (image:138) buildDomain os=linux dom=7 vcpus=1
[2007-02-13 08:41:35 xend 2209] INFO (image:214) configuring linux guest
[2007-02-13 08:41:37 xend 2209] INFO (image:138) buildDomain os=linux dom=8 vcpus=1
[2007-02-13 08:41:39 xend 2209] INFO (XendDomain:370) Domain vm_5 (8) unpaused.
[2007-02-13 08:41:39 xend.XendDomainInfo 2209] WARNING (XendDomainInfo:875)
Domain has crashed: name=vm_4 id=7.
[2007-02-13 08:41:40 xend.XendDomainInfo 2209] ERROR (XendDomainInfo:1661) VM
vm_4 restarting too fast (13.252752 seconds since the last restart).  Refusing
to restart to avoid loops.


Version-Release number of selected component (if applicable):
Version=5 beta 2
Hardware=ibmx306m
Memory=3GB
CPU=Intel(R) Pentium(R) 4 CPU 3.00GHz  (no HT/SMP enabled)
xen-libs-3.0.3-8.el5
xen-3.0.3-8.el5
kernel-xen-2.6.18-1.2747.el5

How reproducible:
everytime some VMs are missing.


Steps to Reproduce: 
- ks install server using base + @virtualisation packages 
- ks install 9 guests using virt-install
- sed -ie 's/XENDOMAINS_SAVE=.*/XENDOMAINS_SAVE=/' /etc/sysconfig/xendomains  #
this does a shutdown instead of suspend
- ln -s /etc/xen/MY_VMS_* /etc/xen/auto
- do reboot
- after reboot verify that all VMs are started

Actual results:

I did 13 reboots. This is how often each VM came up automatically. 
vm_1 13
vm_2 13
vm_3 12
vm_4 11
vm_5 9
vm_6 6
vm_7 5
vm_8 6
vm_9 6
 
So vm_7 only started 5 times at 13 tries.


Expected results:
all machines are started at every reboot

Additional info:
fc6 with 2.6.19 kernel has similar behaviour.
The "Domain has crashed:" entries

Comment 1 Daniel Berrangé 2007-03-27 15:39:03 UTC
Hmm, this is a little worrying - if it can't deal with multiple VMs starting in
very quick succession it sounds like there is some race condition/scalability
issue hiding in either HV or the XenD stack.

Can you reproduce this again & capture the output of 'xm dmesg' once booting has
completed - this will hopefull show if there are any hypervisor issues being
reported. Also can you attach the full /var/log/xen/xend.log,
/var/log/xen/xend-debug.log, /var/log/xen/xen-hotplug.log and finally if any are
HVM guests, also the qemu-dm-*.log files

Finally, can you attach the /etc/xen config file for at least one of the guests
- if they are all basically the same config one is sufficient - if every VM is
different upload a representative set.


Comment 3 Markus Kremer 2007-03-28 07:46:31 UTC
Created attachment 151099 [details]
tgz logs and config

I am using only RHEL5 xen-guests, no HVM. (see first post)

[root@rhrc1s1 x]# crontab -l
01,31 * * * * /usr/sbin/xm list| logger -t XEN1
14,44 * * * * /usr/sbin/xm list| logger -t XEN2
15,45 * * * * /sbin/reboot


[root@rhrc1s1 x]# uname -a
Linux rhrc1s1 2.6.18-8.el5xen #1 SMP Fri Jan 26 14:42:21 EST 2007 i686 i686
i386 GNU/Linux
xm dmesg >var/log/xen/xm.dmesg.out
dmesg >var/log/xen/dmesg.out
The tgz file contains
/var/log/xen/*
/etc/xen/*

Comment 4 Chris Lalancette 2008-03-27 05:09:56 UTC
We did some work in 5.1 to make this less likely to happen, but I'm not sure if
it is completely fixed.  Is this still a problem?

Thanks,
Chris Lalancette

Comment 5 Michal Novotny 2009-04-15 10:26:37 UTC
Well, I have tried it using my SRPMS that can be found at http://people.redhat.com/minovotn/xen and I found no problem, I booted 9 domains total and all the domains booted correctly when testing on my box. The configuration was 4 PV and 5 FV machines...

Comment 7 Markus Kremer 2009-04-20 16:06:49 UTC
Michal,
I am unable to reproduce the problem with RH 5.3 after setting dom0_mem=512M in grub.conf. My tests ran 20 256 RH5.3 64 bit udoms.
Please set the state to fixed.

Comment 8 Chris Lalancette 2009-04-22 10:34:44 UTC
OK, thanks for the testing!  Will close as FIXED.

Chris Lalancette