Bug 241793

Summary: xm create many domains in a tight loop, later domains crash after starting
Product: Red Hat Enterprise Linux 5 Reporter: Richard W.M. Jones <rjones>
Component: xenAssignee: Richard W.M. Jones <rjones>
Status: CLOSED DUPLICATE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 5.0CC: clalance, ehabkost, xen-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-08-14 13:36:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 247265    
Attachments:
Description Flags
xend.log
none
xend-debug.log
none
xen-hotplug.log
none
dmesg
none
xm dmesg
none
/var/log/messages
none
Xen console from a guest when it crashes
none
xenconsoled logging (patch 1/2)
none
xenconsoled logging (patch 2/2)
none
/var/log/debug
none
Console log from a failing guest with vgscan -vvvvvv showing race none

Description Richard W.M. Jones 2007-05-30 17:58:13 UTC
Description of problem:

I have 8 * FC6 guests called fc6_0 through fc6_7.  If I use xm create to start
them all at once in a tight loop then later domains crash after starting up.

  for i in 0 1 2 3 4 5 6 7; do /usr/sbin/xm create fc6_$i; done

Adding a 'sleep 5' inside the loop cures the problem, so it appears to be
related to the high load and something inside xend/xenstored.

Version-Release number of selected component (if applicable):

xen-3.0.3-25.0.2.el5

How reproducible:

Easily seen on my test machine.

Steps to Reproduce:
1. Create 8 domains.
2. Try to start them all in quick succession.
  
Actual results:

Later domains start up but then crash.  (This causes xend to try to restart them
several times, but xend eventually gives up so they fail permanently).

Expected results:

They shouldn't crash.

Additional info:

I'm attaching xend.log, xend-debug.log and xen-hotplug.log from one such run.

Comment 1 Richard W.M. Jones 2007-05-30 17:58:13 UTC
Created attachment 155716 [details]
xend.log

Comment 2 Richard W.M. Jones 2007-05-30 17:58:46 UTC
Created attachment 155717 [details]
xend-debug.log

Comment 3 Richard W.M. Jones 2007-05-30 17:59:10 UTC
Created attachment 155718 [details]
xen-hotplug.log

Comment 4 Richard W.M. Jones 2007-05-30 17:59:55 UTC
Dan has suggested trying:

http://post-office.corp.redhat.com/archives/virtualist/2007-April/msg00239.html

Comment 5 Richard W.M. Jones 2007-05-30 18:17:55 UTC
I applied the suggested change to /etc/init.d/xend, rebooted the machine,
checked that the bind-mount was in place and that the tdb files were being
stored there.

Unfortunately it didn't make any difference.  Domains still fail randomly with
roughly the same regularity.

Comment 6 Richard W.M. Jones 2007-05-30 18:30:46 UTC
Created attachment 155723 [details]
dmesg

On this run, domains fc6_4, fc6_6 and fc6_7 failed.

Comment 7 Richard W.M. Jones 2007-05-30 18:31:38 UTC
Created attachment 155724 [details]
xm dmesg

On this run, domains fc6_4, fc6_6 and fc6_7 failed.

Nothing was sent to xm dmesg during the failed run.

Comment 8 Richard W.M. Jones 2007-05-30 18:31:57 UTC
Created attachment 155725 [details]
/var/log/messages

On this run, domains fc6_4, fc6_6 and fc6_7 failed.

Comment 9 Daniel Berrangé 2007-05-30 20:31:47 UTC
A guest crashing early in boot is almost always a problem with device hotplug /
setup. The logs all show the Dom0  side of device hotplug is working correctly,
so I can only assume the problem is in the guest side.

One promising bit of code is in drivers/xen/xenbus/xenbus_probe.c



/*
 * On a 10 second timeout, wait for all devices currently configured.  We need
 * to do this to guarantee that the filesystems and / or network devices
 * needed for boot are available, before we can allow the boot to proceed.
 *
 * This needs to be on a late_initcall, to happen after the frontend device
 * drivers have been initialised, but before the root fs is mounted.
 *
 * A possible improvement here would be to have the tools add a per-device
 * flag to the store entry, indicating whether it is needed at boot time.
 * This would allow people who knew what they were doing to accelerate their
 * boot slightly, but of course needs tools or manual intervention to set up
 * those flags correctly.
 */
static void wait_for_devices(struct xenbus_driver *xendrv)
{
        unsigned long timeout = jiffies + 10*HZ;
        struct device_driver *drv = xendrv ? &xendrv->driver : NULL;

        if (!ready_to_wait_for_devices || !is_running_on_xen())
                return;

        while (exists_disconnected_device(drv)) {
                if (time_after(jiffies, timeout))
                        break;
                schedule_timeout_interruptible(HZ/10);
        }

        bus_for_each_dev(&xenbus_frontend.bus, NULL, drv,
                         print_device_status);
}


I wouldn't be at all surprised if under your extreme load, the 10 second timeout
was not long enough. If the 10 second timeout were exceeded without devices
connecting, then nothing here logs an error, and if this is the disk the device
the guest would crash very shortly missing its root device. So I think this is
worth exploring. Thus I'm doing a experimental kernel build which increases the
timeout to 120 seconds, and printk()'s every 10 seconds if devices are not setup....


Comment 10 Richard W.M. Jones 2007-05-31 13:16:58 UTC
Created attachment 155811 [details]
Xen console from a guest when it crashes

Comment 11 Richard W.M. Jones 2007-05-31 13:17:30 UTC
Created attachment 155812 [details]
xenconsoled logging (patch 1/2)

Comment 12 Richard W.M. Jones 2007-05-31 13:19:30 UTC
Created attachment 155813 [details]
xenconsoled logging (patch 2/2)

Note, you need both patches to get proper logging.

After applying them, create /var/log/xen/console/ and reboot or restart
xenconsoled.

You will file created for each guest, called
/var/log/xen/console/guest-<DOMID>.log.  The file gets overwritten if a dom ID
is reused (eg. after a reboot).

Comment 13 Richard W.M. Jones 2007-05-31 13:59:13 UTC
Created attachment 155816 [details]
/var/log/debug

(1) The guests are all file-backed so I changed tap:aio:... to file:...

No difference.

(2) Back with tap:aio again, I edited syslog.conf so that *.* ->
/var/log/debug.  Attached is the resulting file.  In this file for example dom
109 boots successfully, whereas dom 113 crashes.

I cannot see any difference in the debug output between the two domains.

Comment 14 Richard W.M. Jones 2007-05-31 14:50:12 UTC
At Dan's suggestion I installed this modified guest kernel which increases the
wait_for_devices delay from 10 seconds to 120 seconds:

http://brewweb.devel.redhat.com/brew/taskinfo?taskID=802116

Sadly even with this kernel, guests still crash in the same place.

I diffed the console logs from before and after and the only differences are
things like kernel version numbers, so it looks like exactly the same crash.

Comment 15 Richard W.M. Jones 2007-06-01 14:15:53 UTC
OK, finally found the problem.  When we load xenblk.ko, it scans for partitions,
but under load it takes quite a while to do the scan.  In normal use you see:

Loading xenblk.ko module
Registering block device major 202
 xvda: xvda1 xvda2
[output of /init continues]

but when it fails you see:

Loading xenblk.ko module
Registering block device major 202
 xvda: [output of /init continues]
[some time later ...] xvda1 xvda2

In particular, vgscan starts to run while xenblk.ko is doing its thing.  As a
result when vgscan queries the kernel for partitions, it doesn't see xvda1 and
xvda2.

Comment 16 Richard W.M. Jones 2007-06-01 14:24:27 UTC
Created attachment 155883 [details]
Console log from a failing guest with vgscan -vvvvvv showing race

Comment 17 Richard W.M. Jones 2007-06-01 14:57:38 UTC
The mkinitrd code contains this, after block devices are loaded and before
trying raid/DM scanning:

  # HACK: module loading + device creation isn't necessarily synchronous...
  # this will make sure that we have all of our devices before trying
  # things like RAID or LVM
  emit "mkblkdevs"

The undocumented mkblkdevs (nash command) just iterates over /sys/block/ and
creates corresponding entries in /dev.  I've no idea how that is supposed to
cause any sort of synchronisation.  Unless going into /sys/block/xvda/ is
supposed to hang during partition checking (which evidently it doesn't).

Comment 18 Richard W.M. Jones 2007-06-27 14:22:27 UTC
Proposed patch: http://lkml.org/lkml/2007/6/12/149

Comment 19 Stephen Tweedie 2007-07-11 15:59:48 UTC
Does the xenblk-only, upstream patch in bug 247265 fix this?


Comment 20 Richard W.M. Jones 2007-07-12 14:30:13 UTC
Yes.  I have now tested the Xen upstream fix and it works.

Comment 21 Chris Lalancette 2007-08-14 13:36:28 UTC
I'm actually going to close this one out as a dup of 247265, for tracking purposes.

Chris Lalancette

*** This bug has been marked as a duplicate of 247265 ***