| Summary: | heartbeats not reliable as means of detecting loss of network in c++ client | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Gordon Sim <gsim> | |
| Component: | qpid-cpp | Assignee: | Andrew Stitcher <astitcher> | |
| Status: | CLOSED ERRATA | QA Contact: | Chuck Rolke <crolke> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 2.0 | CC: | crolke, jross, lzhaldyb, santiago | |
| Target Milestone: | 3.0 | |||
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | qpid-cpp-0.22-4.el6, qpid-cpp-0.22-4.el5 | Doc Type: | Bug Fix | |
| Doc Text: |
It was discovered that the qpid C++ client would not disconnect from a broker that had timed out its heartbeats, if the socket it was using was not writable. This caused issues when sending large messages because the client didn't disconnect correctly upon receiving a heartbeat timeout. The fix corrects this behavior, and clients that are sending large messages correctly disconnect on heartbeat timeouts.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 835119 (view as bug list) | Environment: | ||
| Last Closed: | 2014-09-24 15:04:03 UTC | Type: | --- | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
|
Description
Gordon Sim
2012-02-23 15:54:38 UTC
This might not be related, but it's the closest approximation to the symptoms I'm seeing. In short, heartbeat does not detect a dropped connection.
1) Listener client on my laptop, connecting (via VPN) to remote broker. Heartbeat set to 5.
2) Drop VPN. Leave it off for 10 seconds or 2 hours. Client will not disconnect.
3) Reconnect VPN. Client will still not disconnect.
Am I using heartbeat incorrectly? Are my expectations wrong? I expect the heartbeat option to cause a listener client to abort.
qpid-cpp-0.16-1.fc16.1, and I even tried rebuilding with the patch in your 11/Feb/12 17:08 comment on the QPID-3828 issue. No joy.
Full command and output:
$ QPID_SSL_CERT_DB=[...]/certdb QPID_LOG_ENABLE=trace+ \
./drain -f --broker [snip]:5671 \
--connection-options '{ sasl-mechanism:GSSAPI, \
transport: ssl, \
heartbeat: 5 }' \
'tmp.esm-test; { create:receiver, node: { type: queue, durable: False, x-declare: { exclusive: True, auto-delete: True }, x-bindings: [ { exchange: "standard.topic", queue: "tmp.esm-test", key: "something.#" }]}}'
[...]
2012-06-22 15:44:07 trace RECV [[53016 10.16.36.223:5671]]: Frame[BEbe; channel=0; {ConnectionHeartbeatBody: }]
2012-06-22 15:44:07 trace SENT [[53016 10.16.36.223:5671]]: Frame[BEbe; channel=0; {ConnectionHeartbeatBody: }]
[***this is where I disconnect VPN***]
2012-06-22 15:44:17 debug Traffic timeout
[***that's it. Drain process does not terminate, even after many hours***]
Clearly there's _some_ detection going on. The Traffic timeout message is in src/qpid/client/ConnectionImpl.cpp, and the code proceeds to call idleIn() which in turn calls connector->abort(), but something is getting stuck somewhere.
100% reproducible.
This does indeed look like a similar issue, although in this case with the SSL transport. Created new bug for Ed's reported bug which is a different problem. (In reply to comment #3) > Created new bug for Ed's reported bug which is a different problem. Bug 835119 This is now fixed upstream on trunk in r1475803 (on track for 0.24) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-1296.html |