Description of problem: I have 8 * FC6 guests called fc6_0 through fc6_7. If I use xm create to start them all at once in a tight loop then later domains crash after starting up. for i in 0 1 2 3 4 5 6 7; do /usr/sbin/xm create fc6_$i; done Adding a 'sleep 5' inside the loop cures the problem, so it appears to be related to the high load and something inside xend/xenstored. Version-Release number of selected component (if applicable): xen-3.0.3-25.0.2.el5 How reproducible: Easily seen on my test machine. Steps to Reproduce: 1. Create 8 domains. 2. Try to start them all in quick succession. Actual results: Later domains start up but then crash. (This causes xend to try to restart them several times, but xend eventually gives up so they fail permanently). Expected results: They shouldn't crash. Additional info: I'm attaching xend.log, xend-debug.log and xen-hotplug.log from one such run.
Created attachment 155716 [details] xend.log
Created attachment 155717 [details] xend-debug.log
Created attachment 155718 [details] xen-hotplug.log
Dan has suggested trying: http://post-office.corp.redhat.com/archives/virtualist/2007-April/msg00239.html
I applied the suggested change to /etc/init.d/xend, rebooted the machine, checked that the bind-mount was in place and that the tdb files were being stored there. Unfortunately it didn't make any difference. Domains still fail randomly with roughly the same regularity.
Created attachment 155723 [details] dmesg On this run, domains fc6_4, fc6_6 and fc6_7 failed.
Created attachment 155724 [details] xm dmesg On this run, domains fc6_4, fc6_6 and fc6_7 failed. Nothing was sent to xm dmesg during the failed run.
Created attachment 155725 [details] /var/log/messages On this run, domains fc6_4, fc6_6 and fc6_7 failed.
A guest crashing early in boot is almost always a problem with device hotplug / setup. The logs all show the Dom0 side of device hotplug is working correctly, so I can only assume the problem is in the guest side. One promising bit of code is in drivers/xen/xenbus/xenbus_probe.c /* * On a 10 second timeout, wait for all devices currently configured. We need * to do this to guarantee that the filesystems and / or network devices * needed for boot are available, before we can allow the boot to proceed. * * This needs to be on a late_initcall, to happen after the frontend device * drivers have been initialised, but before the root fs is mounted. * * A possible improvement here would be to have the tools add a per-device * flag to the store entry, indicating whether it is needed at boot time. * This would allow people who knew what they were doing to accelerate their * boot slightly, but of course needs tools or manual intervention to set up * those flags correctly. */ static void wait_for_devices(struct xenbus_driver *xendrv) { unsigned long timeout = jiffies + 10*HZ; struct device_driver *drv = xendrv ? &xendrv->driver : NULL; if (!ready_to_wait_for_devices || !is_running_on_xen()) return; while (exists_disconnected_device(drv)) { if (time_after(jiffies, timeout)) break; schedule_timeout_interruptible(HZ/10); } bus_for_each_dev(&xenbus_frontend.bus, NULL, drv, print_device_status); } I wouldn't be at all surprised if under your extreme load, the 10 second timeout was not long enough. If the 10 second timeout were exceeded without devices connecting, then nothing here logs an error, and if this is the disk the device the guest would crash very shortly missing its root device. So I think this is worth exploring. Thus I'm doing a experimental kernel build which increases the timeout to 120 seconds, and printk()'s every 10 seconds if devices are not setup....
Created attachment 155811 [details] Xen console from a guest when it crashes
Created attachment 155812 [details] xenconsoled logging (patch 1/2)
Created attachment 155813 [details] xenconsoled logging (patch 2/2) Note, you need both patches to get proper logging. After applying them, create /var/log/xen/console/ and reboot or restart xenconsoled. You will file created for each guest, called /var/log/xen/console/guest-<DOMID>.log. The file gets overwritten if a dom ID is reused (eg. after a reboot).
Created attachment 155816 [details] /var/log/debug (1) The guests are all file-backed so I changed tap:aio:... to file:... No difference. (2) Back with tap:aio again, I edited syslog.conf so that *.* -> /var/log/debug. Attached is the resulting file. In this file for example dom 109 boots successfully, whereas dom 113 crashes. I cannot see any difference in the debug output between the two domains.
At Dan's suggestion I installed this modified guest kernel which increases the wait_for_devices delay from 10 seconds to 120 seconds: http://brewweb.devel.redhat.com/brew/taskinfo?taskID=802116 Sadly even with this kernel, guests still crash in the same place. I diffed the console logs from before and after and the only differences are things like kernel version numbers, so it looks like exactly the same crash.
OK, finally found the problem. When we load xenblk.ko, it scans for partitions, but under load it takes quite a while to do the scan. In normal use you see: Loading xenblk.ko module Registering block device major 202 xvda: xvda1 xvda2 [output of /init continues] but when it fails you see: Loading xenblk.ko module Registering block device major 202 xvda: [output of /init continues] [some time later ...] xvda1 xvda2 In particular, vgscan starts to run while xenblk.ko is doing its thing. As a result when vgscan queries the kernel for partitions, it doesn't see xvda1 and xvda2.
Created attachment 155883 [details] Console log from a failing guest with vgscan -vvvvvv showing race
The mkinitrd code contains this, after block devices are loaded and before trying raid/DM scanning: # HACK: module loading + device creation isn't necessarily synchronous... # this will make sure that we have all of our devices before trying # things like RAID or LVM emit "mkblkdevs" The undocumented mkblkdevs (nash command) just iterates over /sys/block/ and creates corresponding entries in /dev. I've no idea how that is supposed to cause any sort of synchronisation. Unless going into /sys/block/xvda/ is supposed to hang during partition checking (which evidently it doesn't).
Proposed patch: http://lkml.org/lkml/2007/6/12/149
Does the xenblk-only, upstream patch in bug 247265 fix this?
Yes. I have now tested the Xen upstream fix and it works.
I'm actually going to close this one out as a dup of 247265, for tracking purposes. Chris Lalancette *** This bug has been marked as a duplicate of 247265 ***