Bug 526299
| Summary: | the clustered broker seems to sometimes not send a close-ok before shutting down the socket | ||
|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Rafael H. Schloming <rafaels> |
| Component: | qpid-cpp | Assignee: | Alan Conway <aconway> |
| Status: | CLOSED ERRATA | QA Contact: | MRG Quality Engineering <mrgqe-bugs> |
| Severity: | medium | Docs Contact: | |
| Priority: | low | ||
| Version: | 1.2 | CC: | aconway, cctrieloff, tross |
| Target Milestone: | 1.3 | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Previously, the Python client reported a connection aborted error when a socket was closed prior to receiving a 'ConnectionCloseOk' control. This happened intermittently when run against a clustered broker. With this update, the 'ConnectionCloseOk' control is always sent before the socket is closed.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2010-10-14 16:15:50 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Fixed in r820783. There was a race where the broker could close the socket before flushing all the frames in the write queue.
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
Previously, the python client reported a connection aborted error when a socket was closed prior to receiving a 'ConnectionCloseOk' control. This happened intermittently when run against a clustered broker. With this update, the 'ConnectionCloseOk' control is always send before the shutdown of a socket.
Technical note updated. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
Diffed Contents:
@@ -1 +1 @@
-Previously, the python client reported a connection aborted error when a socket was closed prior to receiving a 'ConnectionCloseOk' control. This happened intermittently when run against a clustered broker. With this update, the 'ConnectionCloseOk' control is always send before the shutdown of a socket.+Previously, the Python client reported a connection aborted error when a socket was closed prior to receiving a 'ConnectionCloseOk' control. This happened intermittently when run against a clustered broker. With this update, the 'ConnectionCloseOk' control is always sent before the socket is closed.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0773.html |
Description of problem: The python client reports a connection aborted error if the socket is closed prior to receiving a ConnectionCloseOk control. This seems to happen intermittently when run against a clustered broker. Running the client tests in a loop and capturing logging seems to confirm that the broker is sometimes omitting the ConnectionCloseOk control from the normal shutdown dialog. Version-Release number of selected component (if applicable): Qpid trunk. How reproducible: It's intermittent on any single test run, but running the tests in a loop will reproduce the problem before too long. Steps to Reproduce: 1. Start a clustered broker with logging enabled. 2. Run the following loop from the python directory: # run until a failure occurs while ./qpid-python-test qpid.tests.* -Ddelay=0.1 -v DEBUG -H > /tmp/out.txt; do true; done 3. Examine /tmp/out.txt for the client logs. 4. Examine the broker logs. Actual results: The end of /tmp/out.txt should look something like this: 2009-09-29 12:54:06,444 DEBUG SENT: SessionAttach(name='c1165341-a509-48e2-b150-aa8f1fb4e9a9:0') 2009-09-29 12:54:06,444 DEBUG SENT: SessionCommandPoint(command_id=serial(0), command_offset=0) 2009-09-29 12:54:06,444 DEBUG SENT: SessionDetach(name='c1165341-a509-48e2-b150-aa8f1fb4e9a9:0') 2009-09-29 12:54:06,445 DEBUG SENT: ConnectionClose(reply_code=200) 2009-09-29 12:54:06,449 DEBUG SENT: '\x0f\x00\x008\x00\x00\x00\x00\x00\x00\x00\x00\x02\x01\x01\x00\x00&c1165341-a509-\ 48e2-b150-aa8f1fb4e9a9:0\x0f\x00\x00\x1c\x00\x00\x00\x00\x00\x00\x00\x00\x02\x07\x03\x00\x00\x00\x00\x00\x00\x00\x00\x0\ 0\x00\x00\x00\x00\x0f\x00\x008\x00\x00\x00\x00\x00\x00\x00\x00\x02\x03\x01\x00\x00&c1165341-a509-48e2-b150-aa8f1fb4e9a9\ :0\x0f\x00\x00\x12\x00\x00\x00\x00\x00\x00\x00\x00\x01\x0b\x01\x00\x00\xc8' 2009-09-29 12:54:06,451 DEBUG READ: '\x0f\x00\x008\x00\x00\x00\x00\x00\x00\x00\x00\x02\x02\x01\x00\x00&c1165341-a509-\ 48e2-b150-aa8f1fb4e9a9:0\x0f\x00\x00\x1c\x00\x00\x00\x00\x00\x00\x00\x00\x02\x07\x03\x00\x00\x00\x00\x00\x00\x00\x00\x0\ 0\x00\x00\x00\x00\x0f\x00\x009\x00\x00\x00\x00\x00\x00\x00\x00\x02\x04\x03\x00\x00&c1165341-a509-48e2-b150-aa8f1fb4e9a9\ :0\x00' 2009-09-29 12:54:06,452 DEBUG RCVD: SessionAttached(name='c1165341-a509-48e2-b150-aa8f1fb4e9a9:0') 2009-09-29 12:54:06,452 DEBUG RCVD: SessionCommandPoint(command_id=serial(0), command_offset=0) 2009-09-29 12:54:06,452 DEBUG RCVD: SessionDetached(name='c1165341-a509-48e2-b150-aa8f1fb4e9a9:0', code=0) 2009-09-29 12:54:06,454 DEBUG ABORTED: <socket._socketobject object at 0x9308fb4> qpid.tests.messaging.SessionTests.testStop .............................. fail Error during teardown: Traceback (most recent call last): File "./qpid-python-test", line 301, in run phase() File "/home/rhs/qpid/python/qpid/tests/messaging.py", line 57, in teardown self.conn.close() File "<string>", line 6, in close File "/home/rhs/qpid/python/qpid/messaging.py", line 228, in close self.disconnect() File "<string>", line 6, in disconnect File "/home/rhs/qpid/python/qpid/messaging.py", line 194, in disconnect self._ewait(lambda: not self._driver._connected) File "/home/rhs/qpid/python/qpid/messaging.py", line 144, in _ewait self._check_error(exc) File "/home/rhs/qpid/python/qpid/messaging.py", line 140, in _check_error raise exc(*self.error) ConnectionError: connection aborted The exact test that fails may vary as there is a chance this will occur during any connection close. The broker logs should look something like this: 2009-09-29 12:54:06 trace SENT [127.0.0.1:43395]: Frame[BEbe; channel=0; {SessionAttachedBody: name=c1165341-a509-48e2-b150-aa8f1fb4e9a9:0; }] 2009-09-29 12:54:06 trace SENT [127.0.0.1:43395]: Frame[BEbe; channel=0; {SessionCommandPointBody: command-id=0; command-offset=0; }] 2009-09-29 12:54:06 trace SENT [127.0.0.1:43395]: Frame[BEbe; channel=0; {SessionDetachedBody: name=c1165341-a509-48e2-b150-aa8f1fb4e9a9:0; code=0; }] 2009-09-29 12:54:06 debug cluster(192.168.1.101:12528 READY) local close of replicated connection 192.168.1.101:12528-90(local) 2009-09-29 12:54:06 trace MCAST 192.168.1.101:12528-90: {ClusterConnectionDeliverCloseBody: } 2009-09-29 12:54:06 trace cluster(192.168.1.101:12528 READY) DLVR: Event[192.168.1.101:12528-90 Frame[BEbe; channel=0; {ClusterConnectionDeliverDoOutputBody: limit=1; }]] 2009-09-29 12:54:06 trace cluster(192.168.1.101:12528 READY) DLVR 7453: Frame[BEbe; channel=0; {ClusterConnectionDeliverDoOutputBody: limit=1; }] control 192.168.1.101:12528-90 2009-09-29 12:54:06 trace cluster(192.168.1.101:12528 READY) DLVR: Event[192.168.1.101:12528-90 Frame[BEbe; channel=0; {ClusterConnectionDeliverCloseBody: }]] 2009-09-29 12:54:06 trace cluster(192.168.1.101:12528 READY) DLVR 7454: Frame[BEbe; channel=0; {ClusterConnectionDeliverCloseBody: }] control 192.168.1.101:12528-90 2009-09-29 12:54:06 info cluster(192.168.1.101:12528 READY) connection closed 192.168.1.101:12528-90 2009-09-29 12:54:06 debug cluster(192.168.1.101:12528 READY) deleted connection: 192.168.1.101:12528-90(local) Expected results: The loop should be able to run indefinitely without failure. Additional info: On one run I saw the SessionDetached control omitted as well.