Bug 240651

Summary: After stress testing, xend becomes unresponsive to socket connections
Product: Fedora Reporter: Richard W.M. Jones
Component: xen
Version: rawhide
Description Richard W.M. Jones 2007-05-19 07:48:25 EDT
Description of problem:

After overnight stress testing, xend no longer responds to connections on its
control socket /var/run/xend/xmlrpc.sock.  This means that basic commands such
as 'xm list' hang.

Methodology of stress tests: http://et.redhat.com/~rjones/xen-stress-tests/

Version-Release number of selected component (if applicable):


How reproducible:

Happened twice at least on my test machine.  Pretty reproducible if I run the
stress tests for an extended period of time.

Steps to Reproduce:
1. Run the stress tests with 8 guests
Actual results:

virt-manager hangs (no updates, cannot be closed).

xm list hangs.

strace of xm list shows:

23848 connect(3, {sa_family=AF_FILE, path="/var/run/xend/xmlrpc.sock"}, 27) = 0
23848 sendto(3, "POST /RPC2 HTTP/1.0\r\nHost: \r\nUser-Agent: xmlrpclib.py/1.0.1
 (by www.pythonware.com)\r\nContent-Type: text/xml\r\nContent-Length: 268\r\n\r\
n", 132, 0, NULL, 0) = 132
23848 sendto(3, "<?xml version=\'1.0\'?>\n<methodCall>\n<methodName>xend.domains
lue><int>0</int></value>\n</param>\n</params>\n</methodCall>\n", 268, 0, NULL, 0
) = 268
23848 recvfrom(3, 0x2aaaaab3a7d4, 1, 0, 0, 0) = ? ERESTARTSYS (To be restarted)

(the final recvfrom hangs - here I hit ^C).

strace of xend shows:

23849 recvfrom(35, "POST /RPC2 HTTP/1.0\r\nHost: \r\nUser-Agent: xmlrpclib.py/1.
0.1 (by www.pythonware.com)\r\nContent-Type: text/xml\r\nContent-Length: 268\r\n
\r\n<?xml version=\'1.0\'?>\n<methodCall>\n<methodName>xend.domains_with_state</
nt></value>\n</param>\n</params>\n</methodCall>\n", 8192, 0, NULL, NULL) = 400
23849 futex(0x9c5ba0, FUTEX_WAKE, 1)    = 0
23849 futex(0x9c5ba0, FUTEX_WAKE, 1)    = 0
23849 futex(0x9c5ba0, FUTEX_WAKE, 1)    = 0
23849 futex(0x9c5ba0, FUTEX_WAKE, 1)    = 0
23849 futex(0x9c5ba0, FUTEX_WAKE, 1)    = 0
23849 futex(0x9c5ba0, FUTEX_WAKE, 1)    = 0
23849 futex(0x9c5ba0, FUTEX_WAKE, 1)    = 0
23849 futex(0x9c5ba0, FUTEX_WAKE, 1)    = 0
23849 futex(0x9c5ba0, FUTEX_WAKE, 1)    = 0
23849 futex(0x9d10c0, FUTEX_WAIT, 0, NULL <unfinished ...>

Expected results:

xend should not hang.

Additional info:

Restarting xend fixes the problem.
Comment 1 Richard W.M. Jones 2007-11-19 10:34:32 EST
Changing to NEEDINFO of me - I need to retest whether this is
still happening with more recent xend.
Comment 3 Richard W.M. Jones 2008-04-04 06:16:59 EDT
Not seen this one for a very long time.  If it reoccurs when
I do more stress testing, will reopen.