Bug 796765 - heartbeats not reliable as means of detecting loss of network in c++ client
Summary: heartbeats not reliable as means of detecting loss of network in c++ client
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 2.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: 3.0
: ---
Assignee: Andrew Stitcher
QA Contact: Chuck Rolke
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-02-23 15:54 UTC by Gordon Sim
Modified: 2014-09-24 15:04 UTC (History)
4 users (show)

Fixed In Version: qpid-cpp-0.22-4.el6, qpid-cpp-0.22-4.el5
Doc Type: Bug Fix
Doc Text:
It was discovered that the qpid C++ client would not disconnect from a broker that had timed out its heartbeats, if the socket it was using was not writable. This caused issues when sending large messages because the client didn't disconnect correctly upon receiving a heartbeat timeout. The fix corrects this behavior, and clients that are sending large messages correctly disconnect on heartbeat timeouts.
Clone Of:
: 835119 (view as bug list)
Environment:
Last Closed: 2014-09-24 15:04:03 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Apache JIRA QPID-3828 0 None None None 2012-08-10 01:14:39 UTC
Red Hat Bugzilla 835119 1 None None None 2021-01-20 06:05:38 UTC
Red Hat Product Errata RHEA-2014:1296 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Messaging 3.0 Release 2014-09-24 19:00:06 UTC

Internal Links: 835119

Description Gordon Sim 2012-02-23 15:54:38 UTC
In particular when sending larger messages, e.g. see https://issues.apache.org/jira/browse/QPID-3828, or anything else that would cause socket to be non-writable (i.e. buffers full) at the point the lack of heartbeats is detected by the client.

Comment 1 Ed Santiago 2012-06-22 22:01:32 UTC
This might not be related, but it's the closest approximation to the symptoms I'm seeing. In short, heartbeat does not detect a dropped connection.

1) Listener client on my laptop, connecting (via VPN) to remote broker. Heartbeat set to 5.

2) Drop VPN. Leave it off for 10 seconds or 2 hours. Client will not disconnect.

3) Reconnect VPN. Client will still not disconnect.

Am I using heartbeat incorrectly? Are my expectations wrong? I expect the heartbeat option to cause a listener client to abort.

qpid-cpp-0.16-1.fc16.1, and I even tried rebuilding with the patch in your 11/Feb/12 17:08 comment on the QPID-3828 issue. No joy.

Full command and output:

   $ QPID_SSL_CERT_DB=[...]/certdb QPID_LOG_ENABLE=trace+   \
       ./drain -f  --broker [snip]:5671                     \
       --connection-options '{ sasl-mechanism:GSSAPI,       \
                               transport: ssl,              \
                               heartbeat: 5 }'              \
         'tmp.esm-test; { create:receiver, node: { type: queue, durable: False, x-declare: { exclusive: True, auto-delete: True }, x-bindings: [ { exchange: "standard.topic", queue: "tmp.esm-test", key: "something.#" }]}}'
   [...]
   2012-06-22 15:44:07 trace RECV [[53016 10.16.36.223:5671]]: Frame[BEbe; channel=0; {ConnectionHeartbeatBody: }]
   2012-06-22 15:44:07 trace SENT [[53016 10.16.36.223:5671]]: Frame[BEbe; channel=0; {ConnectionHeartbeatBody: }]
   [***this is where I disconnect VPN***]
   2012-06-22 15:44:17 debug Traffic timeout
   [***that's it. Drain process does not terminate, even after many hours***]

Clearly there's _some_ detection going on. The Traffic timeout message is in src/qpid/client/ConnectionImpl.cpp, and the code proceeds to call idleIn() which in turn calls connector->abort(), but something is getting stuck somewhere.

100% reproducible.

Comment 2 Gordon Sim 2012-06-25 08:44:20 UTC
This does indeed look like a similar issue, although in this case with the SSL transport.

Comment 3 Andrew Stitcher 2012-06-25 15:26:27 UTC
Created new bug for Ed's reported bug which is a different problem.

Comment 4 Andrew Stitcher 2012-06-25 15:27:11 UTC
(In reply to comment #3)
> Created new bug for Ed's reported bug which is a different problem.

Bug 835119

Comment 5 Andrew Stitcher 2013-04-25 14:47:55 UTC
This is now fixed upstream on trunk in r1475803 (on track for 0.24)

Comment 9 errata-xmlrpc 2014-09-24 15:04:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1296.html


Note You need to log in before you can comment on or make changes to this bug.