Description of problem: After overnight stress testing, xend no longer responds to connections on its control socket /var/run/xend/xmlrpc.sock. This means that basic commands such as 'xm list' hang. Methodology of stress tests: http://et.redhat.com/~rjones/xen-stress-tests/ Version-Release number of selected component (if applicable): xen-3.1.0-0.rc7.1.fc7 How reproducible: Happened twice at least on my test machine. Pretty reproducible if I run the stress tests for an extended period of time. Steps to Reproduce: 1. Run the stress tests with 8 guests (http://et.redhat.com/~rjones/xen-stress-tests/). Actual results: virt-manager hangs (no updates, cannot be closed). xm list hangs. strace of xm list shows: 23848 connect(3, {sa_family=AF_FILE, path="/var/run/xend/xmlrpc.sock"}, 27) = 0 23848 sendto(3, "POST /RPC2 HTTP/1.0\r\nHost: \r\nUser-Agent: xmlrpclib.py/1.0.1 (by www.pythonware.com)\r\nContent-Type: text/xml\r\nContent-Length: 268\r\n\r\ n", 132, 0, NULL, 0) = 132 23848 sendto(3, "<?xml version=\'1.0\'?>\n<methodCall>\n<methodName>xend.domains _with_state</methodName>\n<params>\n<param>\n<value><boolean>1</boolean></value> \n</param>\n<param>\n<value><string>all</string></value>\n</param>\n<param>\n<va lue><int>0</int></value>\n</param>\n</params>\n</methodCall>\n", 268, 0, NULL, 0 ) = 268 23848 recvfrom(3, 0x2aaaaab3a7d4, 1, 0, 0, 0) = ? ERESTARTSYS (To be restarted) (the final recvfrom hangs - here I hit ^C). strace of xend shows: 23849 recvfrom(35, "POST /RPC2 HTTP/1.0\r\nHost: \r\nUser-Agent: xmlrpclib.py/1. 0.1 (by www.pythonware.com)\r\nContent-Type: text/xml\r\nContent-Length: 268\r\n \r\n<?xml version=\'1.0\'?>\n<methodCall>\n<methodName>xend.domains_with_state</ methodName>\n<params>\n<param>\n<value><boolean>1</boolean></value>\n</param>\n< param>\n<value><string>all</string></value>\n</param>\n<param>\n<value><int>0</i nt></value>\n</param>\n</params>\n</methodCall>\n", 8192, 0, NULL, NULL) = 400 23849 futex(0x9c5ba0, FUTEX_WAKE, 1) = 0 23849 futex(0x9c5ba0, FUTEX_WAKE, 1) = 0 23849 futex(0x9c5ba0, FUTEX_WAKE, 1) = 0 23849 futex(0x9c5ba0, FUTEX_WAKE, 1) = 0 23849 futex(0x9c5ba0, FUTEX_WAKE, 1) = 0 23849 futex(0x9c5ba0, FUTEX_WAKE, 1) = 0 23849 futex(0x9c5ba0, FUTEX_WAKE, 1) = 0 23849 futex(0x9c5ba0, FUTEX_WAKE, 1) = 0 23849 futex(0x9c5ba0, FUTEX_WAKE, 1) = 0 23849 futex(0x9d10c0, FUTEX_WAIT, 0, NULL <unfinished ...> Expected results: xend should not hang. Additional info: Restarting xend fixes the problem.
Changing to NEEDINFO of me - I need to retest whether this is still happening with more recent xend.
Based on the date this bug was created, it appears to have been reported against rawhide during the development of a Fedora release that is no longer maintained. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained. If this bug remains in NEEDINFO thirty (30) days from now, we will automatically close it. If you can reproduce this bug in a maintained Fedora version (7, 8, or rawhide), please change this bug to the respective version and change the status to ASSIGNED. (If you're unable to change the bug's version or status, add a comment to the bug and someone will change it for you.) Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we're following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again.
Not seen this one for a very long time. If it reoccurs when I do more stress testing, will reopen.