Bug 240651 - After stress testing, xend becomes unresponsive to socket connections
Summary: After stress testing, xend becomes unresponsive to socket connections
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Fedora
Classification: Fedora
Component: xen
Version: rawhide
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Daniel Berrangé
QA Contact:
URL:
Whiteboard: bzcl34nup
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-05-19 11:48 UTC by Richard W.M. Jones
Modified: 2008-04-04 10:16 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-04-04 10:16:59 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Richard W.M. Jones 2007-05-19 11:48:25 UTC
Description of problem:

After overnight stress testing, xend no longer responds to connections on its
control socket /var/run/xend/xmlrpc.sock.  This means that basic commands such
as 'xm list' hang.

Methodology of stress tests: http://et.redhat.com/~rjones/xen-stress-tests/

Version-Release number of selected component (if applicable):

xen-3.1.0-0.rc7.1.fc7

How reproducible:

Happened twice at least on my test machine.  Pretty reproducible if I run the
stress tests for an extended period of time.

Steps to Reproduce:
1. Run the stress tests with 8 guests
(http://et.redhat.com/~rjones/xen-stress-tests/).
 
Actual results:

virt-manager hangs (no updates, cannot be closed).

xm list hangs.

strace of xm list shows:

23848 connect(3, {sa_family=AF_FILE, path="/var/run/xend/xmlrpc.sock"}, 27) = 0
23848 sendto(3, "POST /RPC2 HTTP/1.0\r\nHost: \r\nUser-Agent: xmlrpclib.py/1.0.1
 (by www.pythonware.com)\r\nContent-Type: text/xml\r\nContent-Length: 268\r\n\r\
n", 132, 0, NULL, 0) = 132
23848 sendto(3, "<?xml version=\'1.0\'?>\n<methodCall>\n<methodName>xend.domains
_with_state</methodName>\n<params>\n<param>\n<value><boolean>1</boolean></value>
\n</param>\n<param>\n<value><string>all</string></value>\n</param>\n<param>\n<va
lue><int>0</int></value>\n</param>\n</params>\n</methodCall>\n", 268, 0, NULL, 0
) = 268
23848 recvfrom(3, 0x2aaaaab3a7d4, 1, 0, 0, 0) = ? ERESTARTSYS (To be restarted)

(the final recvfrom hangs - here I hit ^C).

strace of xend shows:

23849 recvfrom(35, "POST /RPC2 HTTP/1.0\r\nHost: \r\nUser-Agent: xmlrpclib.py/1.
0.1 (by www.pythonware.com)\r\nContent-Type: text/xml\r\nContent-Length: 268\r\n
\r\n<?xml version=\'1.0\'?>\n<methodCall>\n<methodName>xend.domains_with_state</
methodName>\n<params>\n<param>\n<value><boolean>1</boolean></value>\n</param>\n<
param>\n<value><string>all</string></value>\n</param>\n<param>\n<value><int>0</i
nt></value>\n</param>\n</params>\n</methodCall>\n", 8192, 0, NULL, NULL) = 400
23849 futex(0x9c5ba0, FUTEX_WAKE, 1)    = 0
23849 futex(0x9c5ba0, FUTEX_WAKE, 1)    = 0
23849 futex(0x9c5ba0, FUTEX_WAKE, 1)    = 0
23849 futex(0x9c5ba0, FUTEX_WAKE, 1)    = 0
23849 futex(0x9c5ba0, FUTEX_WAKE, 1)    = 0
23849 futex(0x9c5ba0, FUTEX_WAKE, 1)    = 0
23849 futex(0x9c5ba0, FUTEX_WAKE, 1)    = 0
23849 futex(0x9c5ba0, FUTEX_WAKE, 1)    = 0
23849 futex(0x9c5ba0, FUTEX_WAKE, 1)    = 0
23849 futex(0x9d10c0, FUTEX_WAIT, 0, NULL <unfinished ...>


Expected results:

xend should not hang.

Additional info:

Restarting xend fixes the problem.

Comment 1 Richard W.M. Jones 2007-11-19 15:34:32 UTC
Changing to NEEDINFO of me - I need to retest whether this is
still happening with more recent xend.

Comment 2 Bug Zapper 2008-04-04 00:47:38 UTC
Based on the date this bug was created, it appears to have been reported
against rawhide during the development of a Fedora release that is no
longer maintained. In order to refocus our efforts as a project we are
flagging all of the open bugs for releases which are no longer
maintained. If this bug remains in NEEDINFO thirty (30) days from now,
we will automatically close it.

If you can reproduce this bug in a maintained Fedora version (7, 8, or
rawhide), please change this bug to the respective version and change
the status to ASSIGNED. (If you're unable to change the bug's version
or status, add a comment to the bug and someone will change it for you.)

Thanks for your help, and we apologize again that we haven't handled
these issues to this point.

The process we're following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp

We will be following the process here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this
doesn't happen again.

Comment 3 Richard W.M. Jones 2008-04-04 10:16:59 UTC
Not seen this one for a very long time.  If it reoccurs when
I do more stress testing, will reopen.


Note You need to log in before you can comment on or make changes to this bug.