Bug 240651 - After stress testing, xend becomes unresponsive to socket connections
After stress testing, xend becomes unresponsive to socket connections
Status: CLOSED WORKSFORME
Product: Fedora
Classification: Fedora
Component: xen (Show other bugs)
rawhide
All Linux
medium Severity medium
: ---
: ---
Assigned To: Daniel Berrange
bzcl34nup
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-05-19 07:48 EDT by Richard W.M. Jones
Modified: 2008-04-04 06:16 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-04-04 06:16:59 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Richard W.M. Jones 2007-05-19 07:48:25 EDT
Description of problem:

After overnight stress testing, xend no longer responds to connections on its
control socket /var/run/xend/xmlrpc.sock.  This means that basic commands such
as 'xm list' hang.

Methodology of stress tests: http://et.redhat.com/~rjones/xen-stress-tests/

Version-Release number of selected component (if applicable):

xen-3.1.0-0.rc7.1.fc7

How reproducible:

Happened twice at least on my test machine.  Pretty reproducible if I run the
stress tests for an extended period of time.

Steps to Reproduce:
1. Run the stress tests with 8 guests
(http://et.redhat.com/~rjones/xen-stress-tests/).
 
Actual results:

virt-manager hangs (no updates, cannot be closed).

xm list hangs.

strace of xm list shows:

23848 connect(3, {sa_family=AF_FILE, path="/var/run/xend/xmlrpc.sock"}, 27) = 0
23848 sendto(3, "POST /RPC2 HTTP/1.0\r\nHost: \r\nUser-Agent: xmlrpclib.py/1.0.1
 (by www.pythonware.com)\r\nContent-Type: text/xml\r\nContent-Length: 268\r\n\r\
n", 132, 0, NULL, 0) = 132
23848 sendto(3, "<?xml version=\'1.0\'?>\n<methodCall>\n<methodName>xend.domains
_with_state</methodName>\n<params>\n<param>\n<value><boolean>1</boolean></value>
\n</param>\n<param>\n<value><string>all</string></value>\n</param>\n<param>\n<va
lue><int>0</int></value>\n</param>\n</params>\n</methodCall>\n", 268, 0, NULL, 0
) = 268
23848 recvfrom(3, 0x2aaaaab3a7d4, 1, 0, 0, 0) = ? ERESTARTSYS (To be restarted)

(the final recvfrom hangs - here I hit ^C).

strace of xend shows:

23849 recvfrom(35, "POST /RPC2 HTTP/1.0\r\nHost: \r\nUser-Agent: xmlrpclib.py/1.
0.1 (by www.pythonware.com)\r\nContent-Type: text/xml\r\nContent-Length: 268\r\n
\r\n<?xml version=\'1.0\'?>\n<methodCall>\n<methodName>xend.domains_with_state</
methodName>\n<params>\n<param>\n<value><boolean>1</boolean></value>\n</param>\n<
param>\n<value><string>all</string></value>\n</param>\n<param>\n<value><int>0</i
nt></value>\n</param>\n</params>\n</methodCall>\n", 8192, 0, NULL, NULL) = 400
23849 futex(0x9c5ba0, FUTEX_WAKE, 1)    = 0
23849 futex(0x9c5ba0, FUTEX_WAKE, 1)    = 0
23849 futex(0x9c5ba0, FUTEX_WAKE, 1)    = 0
23849 futex(0x9c5ba0, FUTEX_WAKE, 1)    = 0
23849 futex(0x9c5ba0, FUTEX_WAKE, 1)    = 0
23849 futex(0x9c5ba0, FUTEX_WAKE, 1)    = 0
23849 futex(0x9c5ba0, FUTEX_WAKE, 1)    = 0
23849 futex(0x9c5ba0, FUTEX_WAKE, 1)    = 0
23849 futex(0x9c5ba0, FUTEX_WAKE, 1)    = 0
23849 futex(0x9d10c0, FUTEX_WAIT, 0, NULL <unfinished ...>


Expected results:

xend should not hang.

Additional info:

Restarting xend fixes the problem.
Comment 1 Richard W.M. Jones 2007-11-19 10:34:32 EST
Changing to NEEDINFO of me - I need to retest whether this is
still happening with more recent xend.
Comment 2 Bug Zapper 2008-04-03 20:47:38 EDT
Based on the date this bug was created, it appears to have been reported
against rawhide during the development of a Fedora release that is no
longer maintained. In order to refocus our efforts as a project we are
flagging all of the open bugs for releases which are no longer
maintained. If this bug remains in NEEDINFO thirty (30) days from now,
we will automatically close it.

If you can reproduce this bug in a maintained Fedora version (7, 8, or
rawhide), please change this bug to the respective version and change
the status to ASSIGNED. (If you're unable to change the bug's version
or status, add a comment to the bug and someone will change it for you.)

Thanks for your help, and we apologize again that we haven't handled
these issues to this point.

The process we're following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp

We will be following the process here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this
doesn't happen again.
Comment 3 Richard W.M. Jones 2008-04-04 06:16:59 EDT
Not seen this one for a very long time.  If it reoccurs when
I do more stress testing, will reopen.

Note You need to log in before you can comment on or make changes to this bug.