Red Hat Bugzilla – Full Text Bug Listing
|Summary:||Guests get stuck in paused state when booting under heavy load|
|Product:||[Fedora] Fedora||Reporter:||Richard W.M. Jones <rjones>|
|Component:||xen||Assignee:||Richard W.M. Jones <rjones>|
|Status:||CLOSED WORKSFORME||QA Contact:|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2008-09-09 09:07:52 EDT||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
Description Richard W.M. Jones 2007-05-17 08:55:03 EDT
Description of problem: I am starting and stopping four FC6 PV guests from scripts under heavy load. Occasionally a guest will get stuck in the paused state just after it begins to boot (just after xm start <domainname>). Methodology of testing: http://et.redhat.com/~rjones/xen-stress-tests/ Version-Release number of selected component (if applicable): xen-3.1.0-0.rc7.1.fc7 + patch to fix bug 240009 How reproducible: Occurs very infrequently, but definitely reproducible if the tests are left to run for a long time. Steps to Reproduce: 1. Stress test under load, see: http://et.redhat.com/~rjones/xen-stress-tests/ Actual results: Guests stay paused after booting. In the xm list below, fc6-3 has this problem. # /usr/sbin/xm list Name ID Mem VCPUs State Time(s) Domain-0 0 2984 4 r----- 21370.4 centos5 256 1 0.2 fc6 464 256 1 r----- 14.4 fc6-2 467 256 1 -b---- 0.1 fc6-3 452 256 1 --p--- 0.0 fc6-4 465 256 1 -b---- 11.9 freebsd32 256 1 0.0 If the guest is manually unpaused then the boot continues as normal. Expected results: Guest should briefly pause while xend sets them up, then should be automatically resumed by xend. Additional info: I will attach xend.log and xend-debug.log in followups.
Comment 1 Richard W.M. Jones 2007-05-17 08:57:11 EDT
Created attachment 154911 [details] xend.log This is xend.log, cut down so it starts just before the guest is booted. Domain of interest is ID 452, name fc6-3.
Comment 2 Richard W.M. Jones 2007-05-17 08:57:47 EDT
Created attachment 154912 [details] xend-debug.log This is xend-debug.log, cut down so it starts just before the guest is booted. Domain of interest is ID 452, name fc6-3.
Comment 3 Richard W.M. Jones 2007-05-17 09:22:45 EDT
(A reminder to capture xenstore-ls output next time this happens)
Comment 4 Richard W.M. Jones 2007-05-17 09:30:21 EDT
Created attachment 154917 [details] Output of xenstore-ls with 3 domains paused this way Now I seem to have a reliable way to reproduce this bug. What I do is take a huge file (a 4GB disk image from one of the guests) and copy it. Three domains were cycling while this was happening, and all 3 are now stuck paused. # /usr/sbin/xm list Name ID Mem VCPUs State Time(s) Domain-0 0 2984 4 r----- 23718.2 centos5 256 1 0.2 fc6 256 1 55.9 fc6-2 492 256 1 --p--- 0.0 fc6-3 493 256 1 --p--- 0.0 fc6-4 494 256 1 --p--- 0.0 freebsd32 256 1 0.0 There is a message produced when this happens; it comes from the xm start command itself, and it confirms the theory that the hotplug scripts are timing out: + /usr/sbin/xm start fc6-3 Error: Device 0 (vif) could not be connected. Hotplug scripts not working. Usage: xm start <DomainName> Start a Xend managed domain -p, --paused Do not unpause domain after starting it
Comment 5 Bug Zapper 2008-04-03 20:44:43 EDT
Based on the date this bug was created, it appears to have been reported against rawhide during the development of a Fedora release that is no longer maintained. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained. If this bug remains in NEEDINFO thirty (30) days from now, we will automatically close it. If you can reproduce this bug in a maintained Fedora version (7, 8, or rawhide), please change this bug to the respective version and change the status to ASSIGNED. (If you're unable to change the bug's version or status, add a comment to the bug and someone will change it for you.) Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we're following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again.
Comment 6 Richard W.M. Jones 2008-04-04 06:15:06 EDT
Assigning it to me to retest.
Comment 7 Bug Zapper 2008-05-13 22:54:30 EDT
Changing version to '9' as part of upcoming Fedora 9 GA. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 8 Richard W.M. Jones 2008-09-09 09:07:52 EDT
I retested with my load testing scripts a while back and didn't see anything like this, so I'm going to assume WORKSFORME for now.