Description of problem: I am starting and stopping four FC6 PV guests from scripts under heavy load. Occasionally a guest will get stuck in the paused state just after it begins to boot (just after xm start <domainname>). Methodology of testing: http://et.redhat.com/~rjones/xen-stress-tests/ Version-Release number of selected component (if applicable): xen-3.1.0-0.rc7.1.fc7 + patch to fix bug 240009 How reproducible: Occurs very infrequently, but definitely reproducible if the tests are left to run for a long time. Steps to Reproduce: 1. Stress test under load, see: http://et.redhat.com/~rjones/xen-stress-tests/ Actual results: Guests stay paused after booting. In the xm list below, fc6-3 has this problem. # /usr/sbin/xm list Name ID Mem VCPUs State Time(s) Domain-0 0 2984 4 r----- 21370.4 centos5 256 1 0.2 fc6 464 256 1 r----- 14.4 fc6-2 467 256 1 -b---- 0.1 fc6-3 452 256 1 --p--- 0.0 fc6-4 465 256 1 -b---- 11.9 freebsd32 256 1 0.0 If the guest is manually unpaused then the boot continues as normal. Expected results: Guest should briefly pause while xend sets them up, then should be automatically resumed by xend. Additional info: I will attach xend.log and xend-debug.log in followups.
Created attachment 154911 [details] xend.log This is xend.log, cut down so it starts just before the guest is booted. Domain of interest is ID 452, name fc6-3.
Created attachment 154912 [details] xend-debug.log This is xend-debug.log, cut down so it starts just before the guest is booted. Domain of interest is ID 452, name fc6-3.
(A reminder to capture xenstore-ls output next time this happens)
Created attachment 154917 [details] Output of xenstore-ls with 3 domains paused this way Now I seem to have a reliable way to reproduce this bug. What I do is take a huge file (a 4GB disk image from one of the guests) and copy it. Three domains were cycling while this was happening, and all 3 are now stuck paused. # /usr/sbin/xm list Name ID Mem VCPUs State Time(s) Domain-0 0 2984 4 r----- 23718.2 centos5 256 1 0.2 fc6 256 1 55.9 fc6-2 492 256 1 --p--- 0.0 fc6-3 493 256 1 --p--- 0.0 fc6-4 494 256 1 --p--- 0.0 freebsd32 256 1 0.0 There is a message produced when this happens; it comes from the xm start command itself, and it confirms the theory that the hotplug scripts are timing out: + /usr/sbin/xm start fc6-3 Error: Device 0 (vif) could not be connected. Hotplug scripts not working. Usage: xm start <DomainName> Start a Xend managed domain -p, --paused Do not unpause domain after starting it
Based on the date this bug was created, it appears to have been reported against rawhide during the development of a Fedora release that is no longer maintained. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained. If this bug remains in NEEDINFO thirty (30) days from now, we will automatically close it. If you can reproduce this bug in a maintained Fedora version (7, 8, or rawhide), please change this bug to the respective version and change the status to ASSIGNED. (If you're unable to change the bug's version or status, add a comment to the bug and someone will change it for you.) Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we're following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again.
Assigning it to me to retest.
Changing version to '9' as part of upcoming Fedora 9 GA. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
I retested with my load testing scripts a while back and didn't see anything like this, so I'm going to assume WORKSFORME for now.