1151446 – Heartbeats for AMQP 1.0 Qpid Messaging API

Bug 1151446 - Heartbeats for AMQP 1.0 Qpid Messaging API

Summary: Heartbeats for AMQP 1.0 Qpid Messaging API

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise MRG
Classification:	Red Hat
Component:	qpid-cpp
Sub Component:
Version:	3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	unspecified
Target Milestone:	3.2
Target Release:	---
Assignee:	Ken Giusti
QA Contact:	Michal Toth
Docs Contact:
URL:
Whiteboard:
Depends On:	1249942
Blocks:	785156
TreeView+	depends on / blocked

Reported:	2014-10-10 12:25 UTC by Justin Ross
Modified:	2015-10-08 13:09 UTC (History)
CC List:	8 users (show)
Fixed In Version:	qpid-cpp-0.34-3
Doc Type:	Enhancement
Doc Text:	When an AMQP 1.0 client advertises a connection idle timeout value, twice that value is used by the broker as its idle timeout threshold for that connection. This is consistent with the existing 0-10 behavior. This causes the broker to terminate any client that has been idle for twice the value of the client's idle timeout. It is important to note that in this context, __advertized__ means the value sent as the idle timeout in the open frame. Most clients advertize half their configured timeout, but a client may advertise a full configured timeout. This interval is set by passing the 'heartbeat' option to the `qpid::messaging::Connection` constructor. The value is expressed in seconds. You can pass these via the `--connection-options` command line argument to `qpid-receive`: ---- qpid-receive --connection-options "{protocol: amqp1.0, heartbeat: 10}" .... ---- Note that the broker does not support the configuration of a default idle-timeout on the broker side. The idle timeout value for the connection is entirely determined by the configuration of the connection made by the client. There is no way to override this value on the broker side via management.
Clone Of:
Environment:	MPR: Document heartbeat documentation where ever the documentation for the Connection constructor is.
Last Closed:	2015-10-08 13:09:22 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
qpid cpp trace log (16.46 KB, text/plain) 2015-07-03 13:19 UTC, Michal Toth	no flags	Details
Qpid broker trace log (87.74 KB, text/plain) 2015-07-03 13:21 UTC, Michal Toth	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Apache JIRA	QPID-5538	None	None	None	Never
Apache JIRA	QPID-6698	None	None	None	Never
Red Hat Product Errata	RHEA-2015:1879	normal	SHIPPED_LIVE	Red Hat Enterprise MRG Messaging 3.2 Release	2015-10-08 17:07:53 UTC

Description Justin Ross 2014-10-10 12:25:02 UTC

Comment 1 Ken Giusti 2015-02-02 16:54:24 UTC

Upstream patch:

https://svn.apache.org/viewvc?view=revision&revision=1656505

Comment 2 Ken Giusti 2015-02-06 21:16:15 UTC

Turns out this patch does not work with Proton 0.7.  The return code from pn_transport_input/output changed since 0.7.

The following update adds support for proton 0.7:

https://svn.apache.org/viewvc?view=revision&revision=1657964

Comment 4 Ken Giusti 2015-04-24 12:45:08 UTC

This change enables idle timeout support on the broker.

The new operational behavior:

When an AMQP 1.0 client advertises a connection idle timeout value, 2x that value is used by the broker as its idle timeout threshold for that connection.  This is consistent with the existing 0-10 behavior.   This causes the broker to terminate any client that has been idle for 2x the value of the client's idle timeout.  To be clear: 'advertised' means the value sent as the idle timeout in the open frame.  Most clients advertise 1/2 their configured timeout, but a client may advertise 1x the configured timeout.

To test, create a queue consumer that waits forever for messages on an empty queue.  Use the client's 'heartbeat' connection option (assuming a qpid::messaging client speaking 1.0).  Once the connection to the broker is established, send a SIGSTOP to the client.  This will prevent the client from generating idle frames.

After 2x the client's advertised interval, the broker should terminate the connection to the client.

If debug logging is enabled, the broker will log the actual values it uses for the idle timeout on a connection.

Comment 5 Zdenek Kraus 2015-04-24 14:30:54 UTC

To be sure that I understand correctly
Is client free to choose the idle timeout value ?
If so, how it is done in code ?
Does broker have some leverage to enforce some min/max value?

Thank you for valuable input.

Comment 6 Ken Giusti 2015-04-24 17:32:46 UTC

(In reply to Zdenek Kraus from comment #5)
> To be sure that I understand correctly
> Is client free to choose the idle timeout value ?

Yes

> If so, how it is done in code ?

By passing the 'heartbeat' option to the qpid::messaging::Connection constructor.  The value is expressed in seconds.

You can pass these via the --connection-options cmd line argument to qpid-receive:

qpid-receive --connection-options "{protocol: amqp1.0, heartbeat: 10}" ....

> Does broker have some leverage to enforce some min/max value?

No, the broker simply uses what the client sent.  Since the client expresses the interval in seconds, the smallest timeout would be 1 second.   The max value is unsigned 32bit - I think - which would never time out in our lifetimes :)

> 
> Thank you for valuable input.

Comment 7 Jared MORGAN 2015-06-15 03:55:56 UTC

(In reply to Ken Giusti from comment #6)
> (In reply to Zdenek Kraus from comment #5)
> > To be sure that I understand correctly
> > Is client free to choose the idle timeout value ?
> 
> Yes
> 
> > If so, how it is done in code ?
> 
> By passing the 'heartbeat' option to the qpid::messaging::Connection
> constructor.  The value is expressed in seconds.
> 
> You can pass these via the --connection-options cmd line argument to
> qpid-receive:
> 
> qpid-receive --connection-options "{protocol: amqp1.0, heartbeat: 10}" ....
> 
> > Does broker have some leverage to enforce some min/max value?
> 
> No, the broker simply uses what the client sent.  Since the client expresses
> the interval in seconds, the smallest timeout would be 1 second.   The max
> value is unsigned 32bit - I think - which would never time out in our
> lifetimes :)
> 
> > 
> > Thank you for valuable input.

I've done my usual checks for keywords in this ticket within the MICG and MPR and have discovered that qpid-receive is not mentioned in either guide. Should it be? Where is the best place to view the docs for this?

In the MPR, there is mention of qpid::messaging::Connection in the http://docbuilder.usersys.redhat.com/19948/#Cluster_Failover_in_C2 section This is the only place the heartbeat option is mentioned.

Heartbeats are disabled by default. You can enable them by specifying a heartbeat interval (in seconds) for the connection via the heartbeat option. For example:
qpid::messaging::Connection c("node1,node2,node3","{reconnect:true,heartbeat:10}");

Is that info buried a bit too deep in this guide, and should it be made more generic? Is it applicable to other Connection settings?

Comment 8 Ken Giusti 2015-06-15 13:15:50 UTC

(In reply to Jared MORGAN from comment #7)
> (In reply to Ken Giusti from comment #6)
> > (In reply to Zdenek Kraus from comment #5)
> > > To be sure that I understand correctly
> > > Is client free to choose the idle timeout value ?
> > 
> > Yes
> > 
> > > If so, how it is done in code ?
> > 
> > By passing the 'heartbeat' option to the qpid::messaging::Connection
> > constructor.  The value is expressed in seconds.
> > 
> > You can pass these via the --connection-options cmd line argument to
> > qpid-receive:
> > 
> > qpid-receive --connection-options "{protocol: amqp1.0, heartbeat: 10}" ....
> > 
> > > Does broker have some leverage to enforce some min/max value?
> > 
> > No, the broker simply uses what the client sent.  Since the client expresses
> > the interval in seconds, the smallest timeout would be 1 second.   The max
> > value is unsigned 32bit - I think - which would never time out in our
> > lifetimes :)
> > 
> > > 
> > > Thank you for valuable input.
> 
> I've done my usual checks for keywords in this ticket within the MICG and
> MPR and have discovered that qpid-receive is not mentioned in either guide.
> Should it be? Where is the best place to view the docs for this?
> 

No, qpid-receive isn't shipped as part of the MRG packages.  It is a tool available from upstream - I was using it as an example to help Zdenek test this feature.

> In the MPR, there is mention of qpid::messaging::Connection in the
> http://docbuilder.usersys.redhat.com/19948/#Cluster_Failover_in_C2 section
> This is the only place the heartbeat option is mentioned.
> 
> Heartbeats are disabled by default. You can enable them by specifying a
> heartbeat interval (in seconds) for the connection via the heartbeat option.
> For example:
> qpid::messaging::Connection
> c("node1,node2,node3","{reconnect:true,heartbeat:10}");
> 
> Is that info buried a bit too deep in this guide, and should it be made more
> generic? Is it applicable to other Connection settings?

IMHO, I'd document the heartbeat option where the other connection options are documented.  I tried following the link to Connection that appears lower in this section:

http://docbuilder.usersys.redhat.com/19948/#Hello_World_Walkthrough

but that link fails.   I'd assume the heartbeat documentation would be included where ever the documentation for the Connection constructor is.   I'd link from the Cluster Failover documentation back to the Connection documentation also.

But, again, this is just IMHO.

Comment 10 Michal Toth 2015-07-03 13:17:45 UTC

There is a bug, when system time is changed to the future on broker machine. AMQP 1.0 connection is dropped, while 0.10 persists.

Reproducer:
1. Start the broker on machine (A)

2. Connect the client from different machine (B) to the broker (A) via AMQP 1.0 and enabled heartbeats.
$ ./qc2_drain -b admin/admin.144.11:5672 -f --connection-options "{heartbeat:3, protocol:amqp1.0}" "testq2;{create:always}"

3. On broker machine (A) change date to the future, and observe AMQP 1.0 client. 
$ date -s @$(($(date '+%s') + 300))

4. Client (B) disconnects from the broker.

Reproducible always

Note: 
Perform step 2. twice before changing time in step 3. : connect first client using amqp1.0 and second with amqp0-10 protocol.
Observe, that client using amqp0-10 will persist the time change.

See bz1080165 and attached log files for more details.

Comment 11 Michal Toth 2015-07-03 13:19:09 UTC

There is a bug, when system time is changed to the future on broker machine. AMQP 1.0 connection is dropped, while 0.10 persists.

Reproducer:
1. Start the broker on machine (A)

2. Connect the client from different machine (B) to the broker (A) via AMQP 1.0 and enabled heartbeats.
$ ./qc2_drain -b admin/admin.144.11:5672 -f --connection-options "{heartbeat:3, protocol:amqp1.0}" "testq2;{create:always}"

3. On broker machine (A) change date to the future, and observe AMQP 1.0 client. 
$ date -s @$(($(date '+%s') + 300))

4. Client (B) disconnects from the broker.

Reproducible always

Note: 
Perform step 2. twice before changing time in step 3. : connect first client using amqp1.0 and second with amqp0-10 protocol.
Observe, that client using amqp0-10 will persist the time change.

See bz1080165 and attached log files for more details.

Tested on rhel 7 & 6 both archs
qpid-cpp-server-0.34-1
qpid-cpp-client-0.34-1

Comment 12 Michal Toth 2015-07-03 13:19:55 UTC

Created attachment 1045861 [details]
qpid cpp trace log

Comment 13 Michal Toth 2015-07-03 13:21:29 UTC

Created attachment 1045863 [details]
Qpid broker trace log

Comment 14 Jared MORGAN 2015-07-05 22:52:34 UTC

(In reply to Ken Giusti from comment #8)
> (In reply to Jared MORGAN from comment #7)
> > (In reply to Ken Giusti from comment #6)
> > > (In reply to Zdenek Kraus from comment #5)
> > > > To be sure that I understand correctly
> > > > Is client free to choose the idle timeout value ?
> > > 
> > > Yes
> > > 
> > > > If so, how it is done in code ?
> > > 
> > > By passing the 'heartbeat' option to the qpid::messaging::Connection
> > > constructor.  The value is expressed in seconds.
> > > 
> > > You can pass these via the --connection-options cmd line argument to
> > > qpid-receive:
> > > 
> > > qpid-receive --connection-options "{protocol: amqp1.0, heartbeat: 10}" ....
> > > 
> > > > Does broker have some leverage to enforce some min/max value?
> > > 
> > > No, the broker simply uses what the client sent.  Since the client expresses
> > > the interval in seconds, the smallest timeout would be 1 second.   The max
> > > value is unsigned 32bit - I think - which would never time out in our
> > > lifetimes :)
> > > 
> > > > 
> > > > Thank you for valuable input.
> > 
> > I've done my usual checks for keywords in this ticket within the MICG and
> > MPR and have discovered that qpid-receive is not mentioned in either guide.
> > Should it be? Where is the best place to view the docs for this?
> > 
> 
> No, qpid-receive isn't shipped as part of the MRG packages.  It is a tool
> available from upstream - I was using it as an example to help Zdenek test
> this feature.
> 
> > In the MPR, there is mention of qpid::messaging::Connection in the
> > http://docbuilder.usersys.redhat.com/19948/#Cluster_Failover_in_C2 section
> > This is the only place the heartbeat option is mentioned.
> > 
> > Heartbeats are disabled by default. You can enable them by specifying a
> > heartbeat interval (in seconds) for the connection via the heartbeat option.
> > For example:
> > qpid::messaging::Connection
> > c("node1,node2,node3","{reconnect:true,heartbeat:10}");
> > 
> > Is that info buried a bit too deep in this guide, and should it be made more
> > generic? Is it applicable to other Connection settings?
> 
> IMHO, I'd document the heartbeat option where the other connection options
> are documented.  I tried following the link to Connection that appears lower
> in this section:
> 
> http://docbuilder.usersys.redhat.com/19948/#Hello_World_Walkthrough
> 
> but that link fails.   I'd assume the heartbeat documentation would be
> included where ever the documentation for the Connection constructor is.  
> I'd link from the Cluster Failover documentation back to the Connection
> documentation also.
> 
> But, again, this is just IMHO.

I've had a close look at the docs, and I did find heartbeat documented in http://docbuilder.usersys.redhat.com/19948/#Connection_Options_Reference

So I think we're covered there.

One thing I noticed is this statement in the table:

By default, TCP retransmission time is around 15 minutes on Linux and 12 seconds on Windows.

15 *minutes*?? <== is that right?

Comment 15 Ken Giusti 2015-07-06 13:15:23 UTC

That doesn't seem correct to me.

The TCP retransmission timer is variable.  It is computed based on the measured round trip time and can vary over time due to network dynamics.  So the application programmer can't really know what the retransmission timer is at any given moment.

The TCP *timeout* - note the difference: the point where TCP has retried as much as it can and essentially gives up and fails the connection - is anywhere between 13 to 30 *minutes*

See http://man7.org/linux/man-pages/man7/tcp.7.html - the part about tcp_retries2

I think the whole point of setting a messaging connection time out is to override the TCP timeout - to fail faster than TCP would.  Essentially, the application programmer is saying - I don't care if you can recover in 13-30 minutes, if my message doesn't get through in (say) 30 seconds I'm going to consider the connection "out of spec" and fail.

Does that make sense?

So no - I don't know where that recommendation came from.  Perhaps this is a question better asked on the messaging mailing list?  Someone might be able to shed some light on why this recommendation is present.

Comment 16 Ken Giusti 2015-07-21 14:40:33 UTC

In summary - the documentation for connection heartbeats is not correct.  We should drop the recommendation to set heartbeats to 1/2 the tcp retransmit interval.  Applications should be free to determine how long a link remains unresponsive - not TCP.

The semantics column should simply say something like this:

Requests that heartbeats be sent every N seconds. If two successive heartbeats are missed the connection is considered lost and will fail or start the reconnect process if configured to do so.

Comment 18 Jared MORGAN 2015-07-31 03:10:03 UTC

(In reply to Ken Giusti from comment #16)
> In summary - the documentation for connection heartbeats is not correct.  We
> should drop the recommendation to set heartbeats to 1/2 the tcp retransmit
> interval.  Applications should be free to determine how long a link remains
> unresponsive - not TCP.
> 
> The semantics column should simply say something like this:
> 
> Requests that heartbeats be sent every N seconds. If two successive
> heartbeats are missed the connection is considered lost and will fail or
> start the reconnect process if configured to do so.

This recommendation is now included in the docs. Thanks for providing this clarifying text, Ken.

http://docbuilder.usersys.redhat.com/19948/#Connection_Options_Reference

Comment 19 Michal Toth 2015-08-04 08:23:38 UTC

This is not a documentation bugzilla. Please see relevant doc bug 1249942.
There has been found a bug, please see comment 11.
Moving back to assigned.

Comment 20 Ken Giusti 2015-08-18 13:33:53 UTC

Pushed a fix for the time change bug upstream:

https://svn.apache.org/viewvc?view=revision&revision=1696415

Comment 21 Ken Giusti 2015-08-18 13:52:57 UTC

Here is the commit info for the change on the MRG qpid git repository:


[kgiusti@localhost qpid (trunk)]$ git log -1
commit c2e8abd958add91b7e99163193d2fe5b65ab9b6e
Author: Ken Giusti <kgiusti>
Date:   Tue Aug 18 13:29:56 2015 +0000

    QPID-6698: use the monotonic clock for AMQP 1.0 idle timeout
    
    git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1696415 13f79535-47bb-0310-9956-ffa450edef68
[kgiusti@localhost qpid (trunk)]$ git remote -v
origin	ssh://git.app.eng.bos.redhat.com/srv/git/rh-qpid.git (fetch)
origin	ssh://git.app.eng.bos.redhat.com/srv/git/rh-qpid.git (push)

Comment 25 Michal Toth 2015-09-03 08:00:00 UTC

Marking as verified on rhel 6 32/64 and rhel 7 using
qpid-cpp-server-0.34-3.el6/7

Setting time to future issue has been resolved.

Comment 27 Ken Giusti 2015-10-06 12:24:30 UTC

The doc text looks good, but I found one bit unclear: the use of "enforce" in the following sentence:

"Note that the broker simply uses what the client sent. It is not able to enforce a value."


The broker does enforce a value - that value being 2x the idle-timeout advertised by the client.  Rather, I think we're trying to explain that the broker doesn't support the _configuration_ of a default idle-timeout on the broker side.  The idle timeout value for the connection is entirely determined by the configuration of the connection made by the client.  There's no way to override this value on the broker side via management.

Comment 28 Zdenek Kraus 2015-10-06 13:20:37 UTC

Scott,

could you please revisit doctext with respect to Ken's comment 27 ?

Comment 30 Ken Giusti 2015-10-07 14:07:35 UTC

+1 to the latest doc text - well done.

Comment 32 errata-xmlrpc 2015-10-08 13:09:22 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2015-1879.html

Note You need to log in before you can comment on or make changes to this bug.